cellassign: Annotate cells to cell types using cellassign

Description

Automatically annotate cells to known types based on the expression patterns of a priori known marker genes.

Usage

cellassign(exprs_obj, marker_gene_info, s = NULL, min_delta = 2,
  X = NULL, B = 10, shrinkage = TRUE, n_batches = 1,
  dirichlet_concentration = 0.01, rel_tol_adam = 1e-04,
  rel_tol_em = 1e-04, max_iter_adam = 1e+05, max_iter_em = 20,
  learning_rate = 0.1, verbose = TRUE, sce_assay = "counts",
  return_SCE = FALSE, num_runs = 1)

Arguments

exprs_obj

Either a matrix representing gene expression counts or a SummarizedExperiment. See details.

marker_gene_info

Information relating marker genes to cell types. See details.

Numeric vector of cell size factors

min_delta

The minimum log fold change a marker gene must be over-expressed by in its cell type

Numeric matrix of external covariates. See details.

Number of bases to use for RBF dispersion function

shrinkage

Logical - should the delta parameters have hierarchical shrinkage?

n_batches

Number of data subsample batches to use in inference

dirichlet_concentration

Dirichlet concentration parameter for cell type abundances

rel_tol_adam

The change in Q function value (in pct) below which each optimization round is considered converged

rel_tol_em

The change in log marginal likelihood value (in pct) below which the EM algorithm is considered converged

max_iter_adam

Maximum number of ADAM iterations to perform in each M-step

max_iter_em

Maximum number of EM iterations to perform

learning_rate

Learning rate of ADAM optimization

verbose

Logical - should running info be printed?

sce_assay

The assay from the input SingleCellExperiment to use: this assay should always represent raw counts.

return_SCE

Logical - should a SingleCellExperiment be returned with the cell type annotations added? See details.

num_runs

Number of EM runs to perform

Value

An object of class cellassign_fit. See details

Details

Input format exprs_obj should be either a SummarizedExperiment (we recommend the SingleCellExperiment package) or a cell (row) by gene (column) matrix of raw RNA-seq counts (do not log-transform or otherwise normalize).

marker_gene_info should either be

A gene by cell type binary matrix, where a 1 indicates that a gene is a marker for a cell type, and 0 otherwise
A list with names corresponding to cell types, where each entry is a vector of marker gene names. These are converted to the above matrix using the marker_list_to_mat function.

Cell size factors If the cell size factors s are not provided they are computed using the computeSumFactors function from the scran package.

Covariates If X is not NULL then it should be an N by P matrix of covariates for N cells and P covariates. Such a matrix would typically be returned by a call to model.matrix with no intercept. It is also highly recommended that any numerical (ie non-factor or one-hot-encoded) covariates be standardized to have mean 0 and standard deviation 1.

cellassign_fit A call to cellassign returns an object of class cellassign_fit. To access the MLE estimates of cell types, call fit$cell_type. To access all MLE parameter estimates, call fit$mle_params.

Returning a SingleCellExperiment

If return_SCE is true, a call to cellassign will return the input SingleCellExperiment, with the following added:

A column cellassign_celltype to colData(sce) with the MAP estimate of the cell type
A slot sce@metadata$cellassign_fit containing the cellassign fit. Note that a SingleCellExperiment must be provided as exprs_obj for this option to be valid.

Examples

Run this code

# NOT RUN {
data(example_sce)
data(example_rho)

fit <- em_result <- cellassign(example_sce,
marker_gene_info = example_rho,
s = colSums(SummarizedExperiment::assay(example_sce, "counts")),
learning_rate = 1e-2,
shrinkage = TRUE,
verbose = FALSE)


# }

Run the code above in your browser using DataLab