Learn R Programming

cellGeometry (version 0.5.7)

cellMarkers: Identify cell markers

Description

Uses geometric method based on vector dot product to identify genes which are the best markers for individual cell types.

Usage

cellMarkers(
  scdata,
  bulkdata = NULL,
  subclass,
  cellgroup = NULL,
  nsubclass = 25,
  ngroup = 10,
  expfilter = 0.5,
  noisefilter = 2,
  noisefraction = 0.25,
  min_cells = 10,
  remove_subclass = NULL,
  dual_mean = FALSE,
  meanFUN = "logmean",
  postFUN = NULL,
  verbose = TRUE,
  sliceMem = 16,
  cores = 1L,
  ...
)

Value

A list object with S3 class 'cellMarkers' containing:

call

the matched call

best_angle

named list containing a matrix for each cell type with genes in rows. Rows are ranked by lowest specificity angle for that cell type and highest maximum expression. Columns are: angle the specificity angle in radians, angle.deg the same angle in degrees, max the maximum mean expression across all cell types, rank the rank of the mean gene expression for that cell type compared to the other cell types

group_angle

named list of matrices similar to best_angle, for each cell subclass

geneset

character vector of selected gene markers for cell types

group_geneset

character vector of selected gene markers for cell subclasses

genemeans

matrix of mean log2+1 gene expression with genes in rows and cell types in columns

genemeans_filtered

matrix of gene expression for cell types following noise reduction

groupmeans

matrix of mean log2+1 gene expression with genes in rows and cell subclasses in columns

groupmeans_filtered

matrix of gene expression for cell subclasses following noise reduction

cell_table

factor encoded vector containing the groupings of the cell types within cell subclasses, determined by which subclass contains the maximum number of cells for each cell type

spillover

matrix of spillover values between cell types

subclass_table

contingency table of the number of cells in each subclass

opt

list storing options, namely arguments nsubclass, ngroup, expfilter, noisefilter, noisefraction

genemeans_ar

if dual_mean is TRUE, optional matrix of arithmetic mean, i.e. log2(mean(counts)+1)

genemeans_filtered_ar

optional matrix of arithmetic mean following noise reduction

The 'cellMarkers' object is designed to be passed to deconvolute() to deconvolute bulk RNA-Seq data. It can be updated rapidly with different settings using updateMarkers(). Ensembl gene ids can be substituted for recognisable gene symbols by applying gene2symbol().

Arguments

scdata

Single-cell data matrix with genes in rows and cells in columns. Can be sparse matrix or DelayedMatrix. Must have rownames representing gene IDs or gene symbols.

bulkdata

Optional data matrix containing bulk RNA-Seq data with genes in rows and samples in columns. This matrix is only used for its rownames (gene IDs), to ensure that cell markers are selected from genes in the bulk dataset.

subclass

Vector of cell subclasses matching the columns in scdata

cellgroup

Optional grouping vector of major cell types matching the columns in scdata. subclass is assumed to contain subclasses which are subsets within cellgroup overarching classes.

nsubclass

Number of genes to select for each single cell subclass. Either a single number or a vector with the number of genes for each subclass.

ngroup

Number of genes to select for each cell group. Either a single number or a vector with the number of genes for each group.

expfilter

Genes whose maximum mean expression on log2 scale per cell type are below this value are removed and not considered for the signature.

noisefilter

Sets an upper bound for noisefraction cut-off below which gene expression is set to 0. Essentially gene expression above this level must be retained in the signature. Setting this higher can allow more suppression via noisefraction and can favour more highly expressed genes.

noisefraction

Numeric value. Maximum mean log2 gene expression across cell types is calculated and values in celltypes below this fraction are set to 0. Set in conjunction with noisefilter. Note: if this is set too high (too close to 1), it can have a deleterious effect on deconvolution.

min_cells

Numeric value specifying minimum number of cells in a subclass category. Subclass categories with fewer cells will be ignored.

remove_subclass

Character vector of subclass levels to be removed from the analysis.

dual_mean

Logical whether to calculate arithmetic mean of counts as well as mean(log2(counts +1)). This is mainly useful for simulation.

meanFUN

Either a character value or function for applying mean which is passed to scmean(). Options include "logmean" (the default) or "trimmean" which is a trimmed after excluding the top/bottom 5% of values.

postFUN

Optional function applied to genemeans matrices after mean has been calculated. If meanFUN is set to "trimmean", then postFUN is set to log2s. See scmean().

verbose

Logical whether to show messages.

sliceMem

Max amount of memory in GB to allow for each subsetted count matrix object. When scdata is subsetted by each cell subclass, if the amount of memory would be above sliceMem then slicing is activated and the subsetted count matrix is divided into chunks and processed separately. This is indicated by addition of '...' in the printed timings. The limit is just under 17.2 GB (2^34 / 1e9). Above this the subsetted matrix breaches the long vector limit (>2^31 elements).

cores

Integer, number of cores to use for parallelisation using mclapply(). Parallelisation is not available on windows. Warning: parallelisation has increased memory requirements. See scmean().

...

Additional arguments passed to scmean() such as use_future.

Author

Myles Lewis

Details

If verbose = TRUE, the function will display an estimate of the required memory. But importantly this estimate is only a guide. It is provided to help users choose the optimal number of cores during parallelisation. Real memory usage might well be more, theoretically up to double this amount, due to R's use of copy-on-modify.

See Also

deconvolute() updateMarkers() gene2symbol()