preprocessCoverage(coverageInfo, groupInfo = NULL, cutoff = 5, colsubset = NULL, lowMemDir = NULL, ...)$coverage-- with
the coverage data and a logical Rle --$position-- with the positions
that passed the cutoff. This object is generated using loadCoverage.NULL no group mean coverages are calculated. If the factor has more
than one level, the first one will be used to calculate the log2 fold change
in calculatePvalues.filter.coverageInfo$coverage that denote samples you wish to include in
analysis.lowMemDir and later loaded in
fstats.apply when
running calculateStats and calculatePvalues. Using this option
helps reduce the memory load as each fork in bplapply
loads only the data needed for the chunk processing. The downside is a bit
longer computation time due to input/output.colsubset is
not NULL the number of columns will be less than those in
coverageInfo$coverage. The total number of rows depends on the number
of base pairs that passed the cutoff and the information stored is
the coverage at that given base. Further note that filterData is
re-applied if colsubset is not NULL and could thus lead to
fewer rows compared to coverageInfo$coverage. chunksize.groupInfo=NULL.chunksize is NULL, then mc.cores is used to
determine the chunksize. This is useful if you want to split the data
so each core gets the same amount of data (up to rounding).Computing the indexes and using those for mclapply reduces memory copying as described by Ryan Thompson and illustrated in approach #4 at http://lcolladotor.github.io/2013/11/14/Reducing-memory-overhead-when-using-mclapply
If lowMemDir is specified then $coverageProcessed is NULL and
$mclapplyIndex is a vector with the chunk identifiers.
## Split the data and transform appropriately before using calculateStats()
dataReady <- preprocessCoverage(genomeData, cutoff = 0, scalefac = 32,
chunksize = 1e3, colsubset = NULL, verbose = TRUE)
names(dataReady)
dataReady
Run the code above in your browser using DataLab