Deconvolution of bulk RNA-Seq using vector projection method with adjustable compensation for spillover.
deconvolute(
mk,
test,
logged_bulk = FALSE,
count_space = TRUE,
comp_amount = 1,
group_comp_amount = 0,
weights = NULL,
weight_method = "equal",
adjust_comp = TRUE,
use_filter = TRUE,
arith_mean = FALSE,
convert_bulk = FALSE,
check_comp = FALSE,
npass = 1,
outlier_method = c("var.e", "cooks", "rstudent"),
outlier_cutoff = switch(outlier_method, var.e = 4, cooks = 1, rstudent = 10),
outlier_quantile = 0.9,
verbose = TRUE,
cores = 1L
)A list object of S3 class 'deconv' containing:
the matched call
the original 'cellMarkers' class object
list object containing:
output, the amount of each subclass based purely on project
gene expression
percent, the proportion of each subclass scaled as a percentage
so that the total amount across all subclasses adds to 100%
spillover, the spillover matrix
compensation, the mixed final compensation matrix which
incorporates comp_amount
rawcomp, the original unadjusted compensation matrix
comp_amount, the final values for the amount of compensation
across each cell subclass after adjustment to prevent negative values
residuals, residuals, that is gene expression minus fitted
values
var.e, variance of weighted residuals for each gene
weights, vector of weights
resvar, \(s^2\) the estimate of the gene expression variance
for each sample
se, standard errors of cell counts
hat, diagonal elements of the hat matrix
removed, vector of outlying genes removed during successive
passes
similar list object to subclass, but with results for the
cell group analysis.
alternative matrix of cell output results for each subclass adjusted so that the cell outputs across subclasses are nested as a proportion of cell group outputs.
alternative matrix of cell proportion results for each subclass adjusted so that the percentages across subclasses are nested within cell group percentages. The total percentage still adds to 100%.
original argument comp_amount
optional list element returned when check_comp = TRUE
object of class 'cellMarkers'. See cellMarkers().
matrix of bulk RNA-Seq to be deconvoluted with genes in rows and
samples in columns. We recommend raw counts as input, but normalised data
can be provided, in which case set logged_bulk = TRUE.
Logical, whether log2 transformed bulk RNA-Seq data is
used as input in test.
Logical, whether deconvolution is performed in count space (as opposed to log2 space). Signature and test revert to count scale by 2^ exponentiation during deconvolution.
either a single value from 0-1 for the amount of compensation or a numeric vector with the same length as the number of cell subclasses to deconvolute.
either a single value from 0-1 for the amount of compensation for cell group analysis or a numeric vector with the same length as the number of cell groups to deconvolute.
Optional vector of weights which affects how much each gene in the gene signature matrix affects the deconvolution.
Optional. Choices include "none" or "equal" in which
gene weights are calculated so that each gene has equal weighting in the
vector projection; "equal" overrules any vector supplied by weights.
logical, whether to optimise comp_amount to prevent
negative cell proportion projections.
logical, whether to use denoised signature matrix.
logical, whether to use arithmetic means (if available) for signature matrix. Mainly useful with pseudo-bulk simulation.
either "ref" to convert bulk RNA-Seq to scRNA-Seq scaling
using reference data or "qqmap" using quantile mapping of the bulk to
scRNA-Seq datasets, or "none" (or FALSE) for no conversion.
logical, whether to analyse compensation values across
subclasses. See plot_comp().
Number of passes. If npass set to 2 or more this activates
removal of genes with excess variance of the residuals.
Method for identifying outlying genes. Options are to use the variance of the residuals for each genes, Cook's distance or absolute Studentized residuals (see details).
Cutoff for removing genes which are outliers based on
method selected by outlier_method.
Controls quantile for the cutoff for identifying
outliers for outlier_method = "cook" or "rstudent".
logical, whether to show messages.
Number of cores for parallelisation via parallel::mclapply().
Myles Lewis
Equal weighting of genes by setting weight_method = "equal" can help
devolution of subclusters whose signature genes have low expression. It is
enabled by default.
If a normalised (i.e. logged) bulk matrix is provided instead of raw counts, then it is important that zero expression is true zero. For this reason we do not recommend use of VST (variance stabilised transformed counts) which has a variable offset.
Multipass deconvolution can be activated by setting npass to 2 or higher.
This is designed to remove genes which behave inconsistently due to noise in
either the sc or bulk datasets, which is increasingly likely if you have
larger signature geneset, i.e. if nsubclass is large. Or you may receive a
warning message "Detected genes with extreme residuals". Three methods are
available for identifying outlier genes (i.e. whose residuals are too noisy)
controlled by outlier_method:
var.e, this calculates the variance of the residuals across samples for
each gene. Genes whose variance of residuals are outliers based on Z-score
standardisation are removed during successive passes.
cooks, this considers the deconvolution as if it were a regression and
applies Cook's distance to the residuals and the hat matrix. This seems to be
the most stringent method (removes fewest genes).
rstudent, externally Studentized residuals are used.
The cutoff specified by outlier_cutoff which is used to determine which
genes are outliers is very sensitive to the outlier method. With var.e the
variances are Z-score scaled. With Cook's distance it is typical to consider
a value of >1 as fairly strong indication of an outlier, while 0.5 is
considered a possible outlier. With Studentized residuals, these are expected
to be on a t distribution scale. However, since gene expression itself does
not derive from a normal distribution, the errors and residuals are not
normally distributed either, which probably explains the need for a very high
cut-off. In practice the choice of settings seems to be dataset dependent.
cellMarkers() updateMarkers() rstudent.deconv()
cooks.distance.deconv()