Learn R Programming

h5vc (version 2.6.3)

tallyRanges: Tallying function with a GRanges interface.

Description

Functions for tallying bam files in genomic intervals provided as GRanges objects, special version of the function for direct writing or computation on a cluster exist.

Usage

tallyRanges(bamfiles, ranges, reference, q = 25, ncycles = 10, max.depth = 1e+06)
tallyRangesToFile(tallyFile, study, bamfiles, ranges, reference, samples = NULL, q = 25, ncycles = 0, max.depth=1e6)
tallyRangesBatch(tallyFile, study, bamfiles, ranges, reference, q = 25, ncycles = 10, max.depth=1e6, regID = "Tally", res = list("ncpus" = 2, "memory" = 24000, "queue"="research-rh6"), written = c(), wrfile = "written.jobs.RDa", waitTime = Inf)

Arguments

bamfiles
Character vector giving the locations of the bam files to be tallied
ranges
A GRanges object describing the ranges that tallies shalle be generated in, e.g. the result of a call to binGenome or a set of exon or gene annotations provided by a TxDB object.
reference
BSgenome object describing the reference genome that the alignments were made against.
samples
The indices (within the HDF5 datasets) corresponding to the samples that the data represents. You can use this option to write sub-sets of samples from a cohort.
q
Read alignment quality cut-off.
ncycles
Number of cycles from the front and back of the reads that should be considered unreliable for mismatch detection
max.depth
Maximum depth of coverage to consider
tallyFile
Filename of the HDF5 tally file that the data shall be written to
study
The location within the HDF5 file that corresponds to the HDF5-group representing the study we are working on.
regID
Identifier for a BatchJobs registry which will be used to store and organise the cluster jobs used for parallelisation of the work.
res
Resource list specifying the compute resources to be requested for each of the cluster jobs.
written
Numerical vector indicating the Job IDs of jobs whose results have already been written to the tally file, this can be used to resume writing after a crash.
wrfile
Filename for a file to store the IDs of already written jobs in, can be used to resume writing after a crash.
waitTime
How long shall the function wait on cluster jobst to finish, before giving up. Default is wait forever.

Value

  • For tallyRanges the return value is a list of lists, where the top level corresponds to the ranges provided as an input to the function and each element is a list of the datasets in compatible format, that can directly be written to an HDF5 file using the writeToTallyFile function. The other two function perform the writing directly and return

Details

tallyRanges returns the tallies corresponding to the specifed ranges, tallyToFile performs the same task but writes the results to the tally file directly. tallyRangesBatch uses the BatchJobs package to set up cluster jobs for tallying and collects and writes the results of those jobs to the tally file. It is important to have a properly configured cluster (inlcuding a .BatchJobs.R as well as a template file). See the documentation of BatchJobs for that information.

Examples

Run this code
suppressPackageStartupMessages(library("h5vc"))
suppressPackageStartupMessages(library("rhdf5"))
files <- list.files( system.file("extdata", package = "h5vcData"), "Pt.*bam$" )
bamFiles <- file.path( system.file("extdata", package = "h5vcData"), files)
suppressPackageStartupMessages(require(BSgenome.Hsapiens.NCBI.GRCh38))
suppressPackageStartupMessages(require(GenomicRanges))
dnmt3a <- read.table(system.file("extdata", "dnmt3a.txt", package = "h5vcData"), header=TRUE, stringsAsFactors = FALSE)
dnmt3a <- with( dnmt3a, GRanges(seqname, ranges = IRanges(start = start, end = end)))
dnmt3a <- reduce(dnmt3a)
require(BiocParallel)
register(MulticoreParam())
theData <- tallyRanges( bamFiles, ranges = dnmt3a[1:3], reference = Hsapiens )
str(theData)

Run the code above in your browser using DataLab