extractIBDsegments: Extract IBD segments from a `fabia` result

Description

extractIBDsegments: R implementation of extractIBDsegments.

IBD segments are identified in FABIA Factorization objects. First accumulations of correlated SNVs are found. Then IBD segments in these accumulations are disentangled. Finally IBD segments are pruned off spurious correlated SNVs.

Usage

"extractIBDsegments"(res,sPF,annot=NULL,chrom="",labelsA=NULL,ps=0.9,psZ=0.8,inteA=500,thresA=11,mintagSNVs=8,off=0,procMinIndivids=0.1,thresPrune=1e-3)

Arguments

res

result of fabia given as Factorization object.

sPF

genotype data obtained by fabia procedure samplesPerFeature; it gives for each SNV the individuals/chromosomes that possess the minor allele.

annot

annotation for the tagSNVs as an object of the class data.frame; if it is NULL then a dummy annotation is generated.

chrom

the chromosome the genotyping data stems from.

labelsA

labels for the individuals; if it is NULL then dummy labels by enumerating individuals are generated.

quantile above which the L values are considered for IBD segment extraction.

psZ

quantile above which the largest Z values are considered for IBD segment extraction.

inteA

number of SNVs in a histogram bin which correspond to the desired IBD segment length.

thresA

threshold for histogram counts above which SNVs are viewed to be locally accumulated in a histogram bin.

mintagSNVs

threshold for minimal tagSNV overlap of intervals in a IBD segment.

off

offset of the histogram.

procMinIndivids

percent of cluster individuals that must have the minor allele to consider an SNV as IBD segment tagSNV.

thresPrune

threshold on the probability of having minimal distance to neighboring tagSNVs; used to prune off SNVs at the border of IBD segments.

Value

An instance of the class IBDsegmentList containing the extracted IBD segments.

Details

The threshold thresA for counts in a bin, which indicates SNV accumulations, is computed and provided by hapFabia when calling this method. Distance probabilities for pruning are based on an exponential distribution with the median distance between tagCNVs as parameter (one over the rate). Thus, the counts are assumed to be Poisson distributed. At the IBD segment border, SNVs that have a large distance to the closest tagSNV are pruned off. thresPrune gives the pruning threshold via a $p$-value for observing this distance or a larger based on the exponential distribution.

Implementation in R.

References

S. Hochreiter et al., ‘FABIA: Factor Analysis for Bicluster Acquisition’, Bioinformatics 26(12):1520-1527, 2010.

Examples

Run this code


data(hapRes)
res <- hapRes$res
sPF <- hapRes$sPF
annot <- hapRes$annot
nnL <- length(Z(res)[1,])
labelsA <- cbind(as.character(1:nnL),
   as.character(1:nnL),as.character(1:nnL),
   as.character(1:nnL))
resIBDsegmentList <- extractIBDsegments(res=res,
   sPF=sPF,annot=annot,chrom="1",labelsA=labelsA,
   ps=0.9,psZ=0.8,inteA=50,thresA=6,mintagSNVs=6,
   off=0,procMinIndivids=0.1,thresPrune=1e-3)

summary(resIBDsegmentList)

print("Position of the first IBD segment:")
print(IBDsegmentPos(resIBDsegmentList[[1]]))

print("Length of the first IBD segment:")
print(IBDsegmentLength(resIBDsegmentList[[1]]))

Run the code above in your browser using DataLab