segIBDatN: Calculates Segment Based Kinship at Native Alleles.

Description

Segment based probability of alleles to be IBD at Native haplotype segments ("kinship at native segments").

Usage

segIBDatN(files, phen, map, thisBreed, refBreeds="others", ubFreq=0.01, minSNP=20,
  unitP="Mb", minL=1.0, unitL="Mb", a=0.0, keep=NULL, lowMem=TRUE, 
  skip=NA, cskip=NA, cores=1)

Arguments

files

This can be a character vector with names of the phased marker files, one file for each chromosome. Alternatively files can be a list with the following two components:

a) hap.thisBreed: A character vector with names of the phased marker files for the individuals from thisBreed, one file for each chromosome.

b) hap.refBreeds: A character vector with names of the phased marker files for the individuals from the reference breeds (refBreeds), one file for each chromosome. If this component is missing, then it is assumed that the haplotypes of these animals are also included in hap.thisBreed.

File names must contain the chromosome name as specified in the map in the form "ChrNAME.", e.g. "Breed2.Chr1.phased". The required format of the marker files is described under Details.

phen

Data frame containing the ID (column "Indiv") and the breed name (column "Breed") of each individual.

map

Data frame providing the marker map with columns including marker name 'Name', chromosome number 'Chr', and possibly the position on the chromosome in Mega base pairs 'Mb', and the position in centimorgan 'cM'. (The position in base pairs could result in an integer overflow). The order of the markers must bethe same as in the files.

thisBreed

Breed name: Results will be computed for individuals from thisBreed.

refBreeds

Vector containing names of genotyped breeds. A segment is considered native if its frequency is smaller than ubFreq in all refBreeds. The default "others" means that all genotyped breeds except thisBreed are considered.

ubFreq

A segment is considered native if its frequency is smaller than ubFreq in all reference breeds.

minSNP

Minimum number of marker SNPs included in a segment.

unitP

The unit for measuring the proportion of the genome included in native segments. Possible units are the number of marker SNPs included in shared segments ('SNP'), the number of Mega base pairs ('Mb'), and the total length of the shared segments in centimorgan ('cM'). In the last two cases the map must include columns with the respective names.

minL

Minimum length of a segment in unitL (e.g. in cM).

unitL

The unit for measuring the length of a segment. Possible units are the number of marker SNPs included in the segment ('SNP'), the number of Mega base pairs ('Mb'), and the genetic distances between the first and the last marker in centimorgan ('cM'). In the last two cases the map must include columns with the respective names.

The function providing the weighting factor for each segment is w(x)=x*x/(a+x*x). The parameter of the function is the length of the segment in unitL. The default value a=0.0 implies no weighting, whereas a>0.0 implies that old inbreeding has less influence on the result than new inbreeding.

keep

Subset of the IDs of the individuals from data frame phen (including individuals from other breeds) or a logical vector indicating the animals in data frame phen that should be used. By default all individuals included in phen will be used.

lowMem

If lowMem=TRUE then temporary files will be created and deleted.

skip

Take line skip+1 of the genotype files as the row with column names. By default, the number is determined automatically.

cskip

Take column cskip+1 of the genotype files as the first column with genotypes. By default, the number is determined automatically.

cores

Number of cores to be used for parallel processing of chromosomes. By default one core is used. For cores=NA the number of cores will be chosen automatically. Using more than one core increases execution time if the function is already fast.

Value

A list containing matrices needed for computing the segment based pobability of alleles to be IBD at native segments. The list has components

segN

This matrix contains for each pair of individuals the probability that two SNPs taken at random position from randomly chosen haplotypes both belong to native segments.

segIBDandN

This matrix contains for each pair of individuals the probability that two SNPs taken at random position from randomly chosen haplotypes belong to a shared native segment.

segZ

1+segIBDandN-segN.

The list has attribute meanIBDatN providing the probability of randomly chosen alleles to be IBD at native haplotype segments. Note that 1-meanIBDatN is the genetic diversity at native segments within the genotyped individuals from thisBreed.

Details

Computation of the segment based probability of alleles to be IBD at native haplotype segments.

Genotype file format: Each file containing phased genotypes has a header and no row names. Cells are separated by blank spaces. The number of rows is equal to the number of markers from the respective chromosome and the markers are in the same order as in the map. The first cskip columns are ignored. The remaining columns contain genotypes of individuals written as two alleles separated by a character, e.g. A/B, 0/1, A|B, A B, or 0 1. The same two symbols must be used for all markers. Column names are the IDs of the individuals. If the blank space is used as separator then the ID of each individual should repeated in the header to get a regular delimited file. The columns to be skipped and the individual IDs must have no white spaces. The name of each file must contain the chromosome name as specified in the map in the form "ChrNAME.", e.g. "Breed2.Chr1.phased".

Examples

Run this code

# NOT RUN {
data(map)
data(Cattle)
dir   <- system.file("extdata", package = "optiSel")
files <- paste(dir, "/Chr", 1:2, ".phased", sep="")
Res   <- segIBDatN(files, Cattle, map, thisBreed="Angler", ubFreq=0.01, 
                   minL=1.0, lowMem=FALSE)
               
## Mean kinship at native segments:
attributes(Res)$meanIBDatN
#[1] 0.06695171

## Results for individuals:
kin <-Res$segIBDandN/Res$segN
use <-upper.tri(kin) & Res$segN>0.2
boxplot(kin[use], ylim=c(0,1))

## Use temporary files to reduce working memory:

Res <- segIBDatN(files, Cattle, map, thisBreed="Angler", ubFreq=0.01,  minL=1.0)
               
## Mean kinship at native segments:
attributes(Res)$meanIBDatN
#[1] 0.06695171

# }

Run the code above in your browser using DataLab