callVariantsSingle( data, sampledata, samples = sampledata$Sample, errorRate = 0.001, minSupport = 2, minAF = 0.05, minStrandSupport = 1, mergeDels = TRUE, aggregator = mean)
list
with elements
Counts
(a 4d integer
array of size [1:12, 1:2, 1:k, 1:n]),
Coverage
(a 3d integer
array of size [1:2, 1:k, 1:n]),
Deletions
(a 3d integer
array of size [1:2, 1:k, 1:n]),
Reference
(a 1d integer
vector of size [1:n]) -- see Details.data.frame
with k
rows (one for each
sample) and columns Column
and (Sample
.
The tally file should contain this information as a group attribute, see getSampleData
for an example.1/1000
mean
, which means that a deletion larger than 1bp
will be annotated with the means of the counts and coverages etc.data.frame
containing annotated calls with the following slots:"-"
in that slot)Reference
dataset, if the tally file contains a sparse representation of the reference, i.e. only positions with mismatches show a reference value the missing values are substituted with "N"
's. It is strongly suggested to write the whole reference into the tally file prior to deletion calling - see writeReference
for details)SupFwd + SupRev
CovFwd + CovRev
Support / Coverage
Control
sample on the forward strandfisher.test
on the contingency matrix matrix(c(CovFwd,CovRev,SupFwd,SupRev), nrow = 2)
at this position - low values could indicate strand biasdata
is a list of datasets which has to at least contain the
Counts
and Coverages
for variant calling respectively
Deletions
for deletion calling (if Deletions
is not present no deletion calls will be made).
This list will usually be generated by a call to the h5dapply
function in which the tally
file, chromosome, datasets and regions within the datasets would be
specified. See h5dapply
for specifics. callVariantsSingle
implements a simple single sample variant callign approach for SNVs and deletions (if Deletions
is a dataset present in the data
parameter. The function applies three essential filters to the provided data, requiring:
- minSupport
total support for the variant at the position
- minStrandSupport
support for the variant on each strand
- an allele freqeuncy of at least minAF
(for pure diploid samples this can be set relatively high, e.g. 0.3, for calling potentially homozygous variants a value of 0.8 or higher might be used)
Calls are annotated with the p-Value of a binom.test
of the present support and coverage given the error rate provided in the errorRate
parameter, no filtering is done on this annotation.
Adjacent deletion calls are merged based in the value of the mergeDels
parameter and their statistics are aggregated with the function supplied in the aggregator
parameter.
library(h5vc) # loading library
tallyFile <- system.file( "extdata", "example.tally.hfs5", package = "h5vcData" )
sampleData <- getSampleData( tallyFile, "/ExampleStudy/16" )
position <- 29979629
windowsize <- 1000
vars <- h5dapply( # Calling Variants
filename = tallyFile,
group = "/ExampleStudy/16",
blocksize = 500,
FUN = callVariantsSingle,
sampledata = sampleData,
names = c("Coverages", "Counts", "Reference", "Deletions"),
range = c(position - windowsize, position + windowsize)
)
vars <- do.call( rbind, vars ) # merge the results from all blocks by row
vars # We did find a variant
Run the code above in your browser using DataLab