preprocessCoverage(coverageInfo, groupInfo = NULL, cutoff = 5, colsubset = NULL, lowMemDir = NULL, ...)
$coverage
-- with
the coverage data and a logical Rle --$position
-- with the positions
that passed the cutoff. This object is generated using loadCoverage.NULL
no group mean coverages are calculated. If the factor has more
than one level, the first one will be used to calculate the log2 fold change
in calculatePvalues.filter
.coverageInfo$coverage
that denote samples you wish to include in
analysis.lowMemDir
and later loaded in
fstats.apply when
running calculateStats and calculatePvalues. Using this option
helps reduce the memory load as each fork in bplapply
loads only the data needed for the chunk processing. The downside is a bit
longer computation time due to input/output.colsubset
is
not NULL
the number of columns will be less than those in
coverageInfo$coverage
. The total number of rows depends on the number
of base pairs that passed the cutoff
and the information stored is
the coverage at that given base. Further note that filterData is
re-applied if colsubset
is not NULL
and could thus lead to
fewer rows compared to coverageInfo$coverage
. chunksize
.groupInfo=NULL
.chunksize
is NULL
, then mc.cores
is used to
determine the chunksize
. This is useful if you want to split the data
so each core gets the same amount of data (up to rounding).Computing the indexes and using those for mclapply reduces memory copying as described by Ryan Thompson and illustrated in approach #4 at http://lcolladotor.github.io/2013/11/14/Reducing-memory-overhead-when-using-mclapply
If lowMemDir
is specified then $coverageProcessed
is NULL and
$mclapplyIndex
is a vector with the chunk identifiers.
## Split the data and transform appropriately before using calculateStats()
dataReady <- preprocessCoverage(genomeData, cutoff = 0, scalefac = 32,
chunksize = 1e3, colsubset = NULL, verbose = TRUE)
names(dataReady)
dataReady
Run the code above in your browser using DataLab