Learn R Programming

cellGeometry (version 0.5.7)

scmean: Single-cell mean log gene expression across cell types

Description

Workhorse function which takes as input a scRNA-Seq gene expression matrix such as embedded in a Seurat object, calculates log2(counts +1) and averages gene expression over a vector specifying cell subclasses or cell types. Very large matrices are handled by slicing rows into blocks to avoid excess memory requirements.

Usage

scmean(
  x,
  celltype,
  FUN = "logmean",
  postFUN = NULL,
  verbose = TRUE,
  sliceMem = 16,
  cores = 1L,
  load_balance = FALSE,
  use_future = FALSE
)

Value

a matrix of mean log2 gene expression across cell types with genes in rows and cell types in columns.

Arguments

x

matrix, sparse matrix or DelayedMatrix of raw counts with genes in rows and cells in columns.

celltype

a vector of cell subclasses or types whose length matches the number of columns in x. It is coerced to a factor. NA are tolerated and the matching columns in x are skipped.

FUN

Character value or function for applying mean. When applied to a matrix of count values, this must return a vector. Recommended options are "logmean" (the default) or "trimmean".

postFUN

Optional function to be applied to whole matrix after mean has been calculated, e.g. log2s.

verbose

Logical, whether to print messages.

sliceMem

Max amount of memory in GB to allow for each subsetted count matrix object. When x is subsetted by each cell subclass, if the amount of memory would be above sliceMem then slicing is activated and the subsetted count matrix is divided into chunks and processed separately. This is indicated by addition of '...' in the timings. The limit is just under 17.2 GB (2^34 / 1e9). At this level the subsetted matrix breaches the long vector limit (>2^31 elements).

cores

Integer, number of cores to use for parallelisation using mclapply(). Parallelisation is not available on windows. Warning: parallelisation increases the memory requirement by multiples of sliceMem. cores is ignored if use_future = TRUE.

load_balance

Logical, whether to load balance memory requirements across cores (experimental).

use_future

Logical, whether to use the future backend for parallelisation via future_lapply() instead of the default which is mclapply(). Note, the future.apply package needs to be installed to enable this.

Author

Myles Lewis

Details

Mean functions which can be applied by setting FUN include logmean (the default) which applies row means to log2(counts+1), or trimmean which calculates the trimmed mean of the counts after top/bottom 5% of values have been excluded. Alternatively FUN = rowMeans calculates the arithmetic mean of counts.

If FUN = trimmean or rowMeans, postFUN needs to be set to log2s which is a simple function which applies log2(x+1).

sliceMem can be set lower on machines with less RAM, but this will slow the analysis down. cores increases the theoretical amount of memory required to around cores * sliceMem in GB. For example on a 64 GB machine, we find a significant speed increase with cores = 3L. Above this level, there is a risk that memory swap will slow down processing.

See Also

scapply() which is a more general version which can apply any function to the matrix. logmean, trimmean are options for controlling the type of mean applied.