callVariantsSingle( data, sampledata, samples = sampledata$Sample, errorRate = 0.001, minSupport = 2, minAF = 0.05, minStrandSupport = 1, mergeDels = TRUE, aggregator = mean)list with elements
Counts (a 4d integer array of size [1:12, 1:2, 1:k, 1:n]),
Coverage (a 3d integer array of size [1:2, 1:k, 1:n]),
Deletions (a 3d integer array of size [1:2, 1:k, 1:n]),
Reference (a 1d integer vector of size [1:n]) -- see Details.data.frame with k rows (one for each
sample) and columns Column and (Sample.
The tally file should contain this information as a group attribute, see getSampleData for an example.1/1000mean, which means that a deletion larger than 1bp
will be annotated with the means of the counts and coverages etc.data.frame containing annotated calls with the following slots:"-" in that slot)Reference dataset, if the tally file contains a sparse representation of the reference, i.e. only positions with mismatches show a reference value the missing values are substituted with "N"'s. It is strongly suggested to write the whole reference into the tally file prior to deletion calling - see writeReference for details)SupFwd + SupRevCovFwd + CovRevSupport / CoverageControl sample on the forward strandfisher.test on the contingency matrix matrix(c(CovFwd,CovRev,SupFwd,SupRev), nrow = 2) at this position - low values could indicate strand biasdata is a list of datasets which has to at least contain the
Counts and Coverages for variant calling respectively
Deletions for deletion calling (if Deletions is not present no deletion calls will be made).
This list will usually be generated by a call to the h5dapply function in which the tally
file, chromosome, datasets and regions within the datasets would be
specified. See h5dapply for specifics. callVariantsSingle implements a simple single sample variant callign approach for SNVs and deletions (if Deletions is a dataset present in the data parameter. The function applies three essential filters to the provided data, requiring:
- minSupport total support for the variant at the position
- minStrandSupport support for the variant on each strand
- an allele freqeuncy of at least minAF (for pure diploid samples this can be set relatively high, e.g. 0.3, for calling potentially homozygous variants a value of 0.8 or higher might be used)
Calls are annotated with the p-Value of a binom.test of the present support and coverage given the error rate provided in the errorRate parameter, no filtering is done on this annotation.
Adjacent deletion calls are merged based in the value of the mergeDels parameter and their statistics are aggregated with the function supplied in the aggregator parameter.
library(h5vc) # loading library
tallyFile <- system.file( "extdata", "example.tally.hfs5", package = "h5vcData" )
sampleData <- getSampleData( tallyFile, "/ExampleStudy/16" )
position <- 29979629
windowsize <- 1000
vars <- h5dapply( # Calling Variants
filename = tallyFile,
group = "/ExampleStudy/16",
blocksize = 500,
FUN = callVariantsSingle,
sampledata = sampleData,
names = c("Coverages", "Counts", "Reference", "Deletions"),
range = c(position - windowsize, position + windowsize)
)
vars <- do.call( rbind, vars ) # merge the results from all blocks by row
vars # We did find a variantRun the code above in your browser using DataLab