Workhorse function which takes as input a scRNA-Seq gene expression matrix such as embedded in a Seurat object, calculates log2(counts +1) and averages gene expression over a vector specifying cell subclasses or cell types. Very large matrices are handled by slicing rows into blocks to avoid excess memory requirements.
scmean(
x,
celltype,
FUN = "logmean",
postFUN = NULL,
verbose = TRUE,
sliceMem = 16,
cores = 1L,
load_balance = FALSE,
use_future = FALSE
)a matrix of mean log2 gene expression across cell types with genes in rows and cell types in columns.
matrix, sparse matrix or DelayedMatrix of raw counts with genes in rows and cells in columns.
a vector of cell subclasses or types whose length matches the
number of columns in x. It is coerced to a factor. NA are tolerated and
the matching columns in x are skipped.
Character value or function for applying mean. When applied to a
matrix of count values, this must return a vector. Recommended options are
"logmean" (the default) or "trimmean".
Optional function to be applied to whole matrix after mean has
been calculated, e.g. log2s.
Logical, whether to print messages.
Max amount of memory in GB to allow for each subsetted count
matrix object. When x is subsetted by each cell subclass, if the amount
of memory would be above sliceMem then slicing is activated and the
subsetted count matrix is divided into chunks and processed separately.
This is indicated by addition of '...' in the timings. The limit is just
under 17.2 GB (2^34 / 1e9). At this level the subsetted matrix breaches the
long vector limit (>2^31 elements).
Integer, number of cores to use for parallelisation using
mclapply(). Parallelisation is not available on windows. Warning:
parallelisation increases the memory requirement by multiples of
sliceMem. cores is ignored if use_future = TRUE.
Logical, whether to load balance memory requirements across cores (experimental).
Logical, whether to use the future backend for
parallelisation via future_lapply() instead of the default which is
mclapply(). Note, the future.apply package needs to be installed to
enable this.
Myles Lewis
Mean functions which can be applied by setting FUN include logmean (the
default) which applies row means to log2(counts+1), or trimmean which
calculates the trimmed mean of the counts after top/bottom 5% of values have
been excluded. Alternatively FUN = rowMeans calculates the arithmetic mean
of counts.
If FUN = trimmean or rowMeans, postFUN needs to be set to log2s which
is a simple function which applies log2(x+1).
sliceMem can be set lower on machines with less RAM, but this will slow the
analysis down. cores increases the theoretical amount of memory required to
around cores * sliceMem in GB. For example on a 64 GB machine, we find a
significant speed increase with cores = 3L. Above this level, there is a
risk that memory swap will slow down processing.
scapply() which is a more general version which can apply any
function to the matrix. logmean,
trimmean are options for controlling the type of
mean applied.