Learn R Programming

cellGeometry (version 0.5.7)

deconvolute: Deconvolute bulk RNA-Seq using single-cell RNA-Seq signature

Description

Deconvolution of bulk RNA-Seq using vector projection method with adjustable compensation for spillover.

Usage

deconvolute(
  mk,
  test,
  logged_bulk = FALSE,
  count_space = TRUE,
  comp_amount = 1,
  group_comp_amount = 0,
  weights = NULL,
  weight_method = "equal",
  adjust_comp = TRUE,
  use_filter = TRUE,
  arith_mean = FALSE,
  convert_bulk = FALSE,
  check_comp = FALSE,
  npass = 1,
  outlier_method = c("var.e", "cooks", "rstudent"),
  outlier_cutoff = switch(outlier_method, var.e = 4, cooks = 1, rstudent = 10),
  outlier_quantile = 0.9,
  verbose = TRUE,
  cores = 1L
)

Value

A list object of S3 class 'deconv' containing:

call

the matched call

mk

the original 'cellMarkers' class object

subclass

list object containing:

  • output, the amount of each subclass based purely on project gene expression

  • percent, the proportion of each subclass scaled as a percentage so that the total amount across all subclasses adds to 100%

  • spillover, the spillover matrix

  • compensation, the mixed final compensation matrix which incorporates comp_amount

  • rawcomp, the original unadjusted compensation matrix

  • comp_amount, the final values for the amount of compensation across each cell subclass after adjustment to prevent negative values

  • residuals, residuals, that is gene expression minus fitted values

  • var.e, variance of weighted residuals for each gene

  • weights, vector of weights

  • resvar, \(s^2\) the estimate of the gene expression variance for each sample

  • se, standard errors of cell counts

  • hat, diagonal elements of the hat matrix

  • removed, vector of outlying genes removed during successive passes

group

similar list object to subclass, but with results for the cell group analysis.

nest_output

alternative matrix of cell output results for each subclass adjusted so that the cell outputs across subclasses are nested as a proportion of cell group outputs.

nest_percent

alternative matrix of cell proportion results for each subclass adjusted so that the percentages across subclasses are nested within cell group percentages. The total percentage still adds to 100%.

comp_amount

original argument comp_amount

comp_check

optional list element returned when check_comp = TRUE

Arguments

mk

object of class 'cellMarkers'. See cellMarkers().

test

matrix of bulk RNA-Seq to be deconvoluted with genes in rows and samples in columns. We recommend raw counts as input, but normalised data can be provided, in which case set logged_bulk = TRUE.

logged_bulk

Logical, whether log2 transformed bulk RNA-Seq data is used as input in test.

count_space

Logical, whether deconvolution is performed in count space (as opposed to log2 space). Signature and test revert to count scale by 2^ exponentiation during deconvolution.

comp_amount

either a single value from 0-1 for the amount of compensation or a numeric vector with the same length as the number of cell subclasses to deconvolute.

group_comp_amount

either a single value from 0-1 for the amount of compensation for cell group analysis or a numeric vector with the same length as the number of cell groups to deconvolute.

weights

Optional vector of weights which affects how much each gene in the gene signature matrix affects the deconvolution.

weight_method

Optional. Choices include "none" or "equal" in which gene weights are calculated so that each gene has equal weighting in the vector projection; "equal" overrules any vector supplied by weights.

adjust_comp

logical, whether to optimise comp_amount to prevent negative cell proportion projections.

use_filter

logical, whether to use denoised signature matrix.

arith_mean

logical, whether to use arithmetic means (if available) for signature matrix. Mainly useful with pseudo-bulk simulation.

convert_bulk

either "ref" to convert bulk RNA-Seq to scRNA-Seq scaling using reference data or "qqmap" using quantile mapping of the bulk to scRNA-Seq datasets, or "none" (or FALSE) for no conversion.

check_comp

logical, whether to analyse compensation values across subclasses. See plot_comp().

npass

Number of passes. If npass set to 2 or more this activates removal of genes with excess variance of the residuals.

outlier_method

Method for identifying outlying genes. Options are to use the variance of the residuals for each genes, Cook's distance or absolute Studentized residuals (see details).

outlier_cutoff

Cutoff for removing genes which are outliers based on method selected by outlier_method.

outlier_quantile

Controls quantile for the cutoff for identifying outliers for outlier_method = "cook" or "rstudent".

verbose

logical, whether to show messages.

cores

Number of cores for parallelisation via parallel::mclapply().

Author

Myles Lewis

Details

Equal weighting of genes by setting weight_method = "equal" can help devolution of subclusters whose signature genes have low expression. It is enabled by default.

If a normalised (i.e. logged) bulk matrix is provided instead of raw counts, then it is important that zero expression is true zero. For this reason we do not recommend use of VST (variance stabilised transformed counts) which has a variable offset.

Multipass deconvolution can be activated by setting npass to 2 or higher. This is designed to remove genes which behave inconsistently due to noise in either the sc or bulk datasets, which is increasingly likely if you have larger signature geneset, i.e. if nsubclass is large. Or you may receive a warning message "Detected genes with extreme residuals". Three methods are available for identifying outlier genes (i.e. whose residuals are too noisy) controlled by outlier_method:

  • var.e, this calculates the variance of the residuals across samples for each gene. Genes whose variance of residuals are outliers based on Z-score standardisation are removed during successive passes.

  • cooks, this considers the deconvolution as if it were a regression and applies Cook's distance to the residuals and the hat matrix. This seems to be the most stringent method (removes fewest genes).

  • rstudent, externally Studentized residuals are used.

The cutoff specified by outlier_cutoff which is used to determine which genes are outliers is very sensitive to the outlier method. With var.e the variances are Z-score scaled. With Cook's distance it is typical to consider a value of >1 as fairly strong indication of an outlier, while 0.5 is considered a possible outlier. With Studentized residuals, these are expected to be on a t distribution scale. However, since gene expression itself does not derive from a normal distribution, the errors and residuals are not normally distributed either, which probably explains the need for a very high cut-off. In practice the choice of settings seems to be dataset dependent.

See Also

cellMarkers() updateMarkers() rstudent.deconv() cooks.distance.deconv()