Automatically annotate cells to known types based on the expression patterns of a priori known marker genes.
cellassign(exprs_obj, marker_gene_info, s = NULL, min_delta = 2,
X = NULL, B = 10, shrinkage = TRUE, n_batches = 1,
dirichlet_concentration = 0.01, rel_tol_adam = 1e-04,
rel_tol_em = 1e-04, max_iter_adam = 1e+05, max_iter_em = 20,
learning_rate = 0.1, verbose = TRUE, sce_assay = "counts",
return_SCE = FALSE, num_runs = 1)
Either a matrix representing gene
expression counts or a SummarizedExperiment
.
See details.
Information relating marker genes to cell types. See details.
Numeric vector of cell size factors
The minimum log fold change a marker gene must be over-expressed by in its cell type
Numeric matrix of external covariates. See details.
Number of bases to use for RBF dispersion function
Logical - should the delta parameters have hierarchical shrinkage?
Number of data subsample batches to use in inference
Dirichlet concentration parameter for cell type abundances
The change in Q function value (in pct) below which each optimization round is considered converged
The change in log marginal likelihood value (in pct) below which the EM algorithm is considered converged
Maximum number of ADAM iterations to perform in each M-step
Maximum number of EM iterations to perform
Learning rate of ADAM optimization
Logical - should running info be printed?
The assay
from the input
SingleCellExperiment
to use: this assay
should always represent raw counts.
Logical - should a SingleCellExperiment be returned with the cell type annotations added? See details.
Number of EM runs to perform
An object of class cellassign_fit
. See details
Input format
exprs_obj
should be either a
SummarizedExperiment
(we recommend the
SingleCellExperiment
package) or a
cell (row) by gene (column) matrix of
raw RNA-seq counts (do not
log-transform or otherwise normalize).
marker_gene_info
should either be
A gene by cell type binary matrix, where a 1 indicates that a gene is a marker for a cell type, and 0 otherwise
A list with names corresponding to cell types, where each entry is a
vector of marker gene names. These are converted to the above matrix using
the marker_list_to_mat
function.
Cell size factors
If the cell size factors s
are
not provided they are computed using the
computeSumFactors
function from
the scran
package.
Covariates
If X
is not NULL
then it should be
an N
by P
matrix
of covariates for N
cells and P
covariates.
Such a matrix would typically
be returned by a call to model.matrix
with no intercept. It is also highly
recommended that any numerical (ie non-factor or one-hot-encoded)
covariates be standardized
to have mean 0 and standard deviation 1.
cellassign_fit
A call to cellassign
returns an object
of class cellassign_fit
. To access the
MLE estimates of cell types, call fit$cell_type
.
To access all MLE parameter
estimates, call fit$mle_params
.
Returning a SingleCellExperiment
If return_SCE
is true, a call to cellassign
will return
the input SingleCellExperiment, with the following added:
A column cellassign_celltype
to colData(sce)
with the MAP
estimate of the cell type
A slot sce@metadata$cellassign_fit
containing the cellassign fit.
Note that a SingleCellExperiment
must be provided as exprs_obj
for this option to be valid.
# NOT RUN {
data(example_sce)
data(example_rho)
fit <- em_result <- cellassign(example_sce,
marker_gene_info = example_rho,
s = colSums(SummarizedExperiment::assay(example_sce, "counts")),
learning_rate = 1e-2,
shrinkage = TRUE,
verbose = FALSE)
# }
Run the code above in your browser using DataLab