hapFabia (version 1.14.0)

extractIBDsegments: Extract IBD segments from a fabia result

Description

extractIBDsegments: R implementation of extractIBDsegments.

IBD segments are identified in FABIA Factorization objects. First accumulations of correlated SNVs are found. Then IBD segments in these accumulations are disentangled. Finally IBD segments are pruned off spurious correlated SNVs.

Usage

"extractIBDsegments"(res,sPF,annot=NULL,chrom="",labelsA=NULL,ps=0.9,psZ=0.8,inteA=500,thresA=11,mintagSNVs=8,off=0,procMinIndivids=0.1,thresPrune=1e-3)

Arguments

res
result of fabia given as Factorization object.
sPF
genotype data obtained by fabia procedure samplesPerFeature; it gives for each SNV the individuals/chromosomes that possess the minor allele.
annot
annotation for the tagSNVs as an object of the class data.frame; if it is NULL then a dummy annotation is generated.
chrom
the chromosome the genotyping data stems from.
labelsA
labels for the individuals; if it is NULL then dummy labels by enumerating individuals are generated.
ps
quantile above which the L values are considered for IBD segment extraction.
psZ
quantile above which the largest Z values are considered for IBD segment extraction.
inteA
number of SNVs in a histogram bin which correspond to the desired IBD segment length.
thresA
threshold for histogram counts above which SNVs are viewed to be locally accumulated in a histogram bin.
mintagSNVs
threshold for minimal tagSNV overlap of intervals in a IBD segment.
off
offset of the histogram.
procMinIndivids
percent of cluster individuals that must have the minor allele to consider an SNV as IBD segment tagSNV.
thresPrune
threshold on the probability of having minimal distance to neighboring tagSNVs; used to prune off SNVs at the border of IBD segments.

Value

An instance of the class IBDsegmentList containing the extracted IBD segments.

Details

The threshold thresA for counts in a bin, which indicates SNV accumulations, is computed and provided by hapFabia when calling this method. Distance probabilities for pruning are based on an exponential distribution with the median distance between tagCNVs as parameter (one over the rate). Thus, the counts are assumed to be Poisson distributed. At the IBD segment border, SNVs that have a large distance to the closest tagSNV are pruned off. thresPrune gives the pruning threshold via a $p$-value for observing this distance or a larger based on the exponential distribution.

Implementation in R.

References

S. Hochreiter et al., ‘FABIA: Factor Analysis for Bicluster Acquisition’, Bioinformatics 26(12):1520-1527, 2010.

See Also

IBDsegment-class, IBDsegmentList-class, analyzeIBDsegments, compareIBDsegmentLists, extractIBDsegments, findDenseRegions, hapFabia, hapFabiaVersion, hapRes, chr1ASW1000G, IBDsegmentList2excel, identifyDuplicates, iterateIntervals, makePipelineFile, matrixPlot, mergeIBDsegmentLists, mergedIBDsegmentList, plotIBDsegment, res, setAnnotation, setStatistics, sim, simu, simulateIBDsegmentsFabia, simulateIBDsegments, split_sparse_matrix, toolsFactorizationClass, vcftoFABIA

Examples

Run this code

data(hapRes)
res <- hapRes$res
sPF <- hapRes$sPF
annot <- hapRes$annot
nnL <- length(Z(res)[1,])
labelsA <- cbind(as.character(1:nnL),
   as.character(1:nnL),as.character(1:nnL),
   as.character(1:nnL))
resIBDsegmentList <- extractIBDsegments(res=res,
   sPF=sPF,annot=annot,chrom="1",labelsA=labelsA,
   ps=0.9,psZ=0.8,inteA=50,thresA=6,mintagSNVs=6,
   off=0,procMinIndivids=0.1,thresPrune=1e-3)

summary(resIBDsegmentList)

print("Position of the first IBD segment:")
print(IBDsegmentPos(resIBDsegmentList[[1]]))

print("Length of the first IBD segment:")
print(IBDsegmentLength(resIBDsegmentList[[1]]))


Run the code above in your browser using DataCamp Workspace