Perform a variance‐stabilizing transformation on UMI counts using
sctransform::vst (https://github.com/satijalab/sctransform). This
replaces the NormalizeData → FindVariableFeatures →
ScaleData workflow by fitting a regularized negative binomial model
per gene and returning:
SCTransform(object, ...)# S3 method for default
SCTransform(
object,
cell.attr,
reference.SCT.model = NULL,
do.correct.umi = TRUE,
ncells = 5000,
residual.features = NULL,
variable.features.n = 3000,
variable.features.rv.th = 1.3,
vars.to.regress = NULL,
latent.data = NULL,
do.scale = FALSE,
do.center = TRUE,
clip.range = c(-sqrt(x = ncol(x = umi)/30), sqrt(x = ncol(x = umi)/30)),
vst.flavor = "v2",
conserve.memory = FALSE,
return.only.var.genes = TRUE,
seed.use = 1448145,
verbose = TRUE,
...
)
# S3 method for Assay
SCTransform(
object,
cell.attr,
reference.SCT.model = NULL,
do.correct.umi = TRUE,
ncells = 5000,
residual.features = NULL,
variable.features.n = 3000,
variable.features.rv.th = 1.3,
vars.to.regress = NULL,
latent.data = NULL,
do.scale = FALSE,
do.center = TRUE,
clip.range = c(-sqrt(x = ncol(x = object)/30), sqrt(x = ncol(x = object)/30)),
vst.flavor = "v2",
conserve.memory = FALSE,
return.only.var.genes = TRUE,
seed.use = 1448145,
verbose = TRUE,
...
)
# S3 method for Seurat
SCTransform(
object,
assay = "RNA",
new.assay.name = "SCT",
reference.SCT.model = NULL,
do.correct.umi = TRUE,
ncells = 5000,
residual.features = NULL,
variable.features.n = 3000,
variable.features.rv.th = 1.3,
vars.to.regress = NULL,
do.scale = FALSE,
do.center = TRUE,
clip.range = c(-sqrt(x = ncol(x = object[[assay]])/30), sqrt(x = ncol(x =
object[[assay]])/30)),
vst.flavor = "v2",
conserve.memory = FALSE,
return.only.var.genes = TRUE,
seed.use = 1448145,
verbose = TRUE,
...
)
# S3 method for IterableMatrix
SCTransform(
object,
cell.attr,
reference.SCT.model = NULL,
do.correct.umi = TRUE,
ncells = 5000,
residual.features = NULL,
variable.features.n = 3000,
variable.features.rv.th = 1.3,
vars.to.regress = NULL,
latent.data = NULL,
do.scale = FALSE,
do.center = TRUE,
clip.range = c(-sqrt(x = ncol(x = object)/30), sqrt(x = ncol(x = object)/30)),
vst.flavor = "v2",
conserve.memory = FALSE,
return.only.var.genes = TRUE,
seed.use = 1448145,
verbose = TRUE,
...
)
A Seurat object with a new SCT assay containing:
counts (corrected UMIs), data (log1p counts), and
scale.data (Pearson residuals), plus misc for intermediate
vst outputs.
A Seurat object or UMI count matrix.
Additional arguments passed to sctransform::vst.
Optional metadata frame (cells × attributes).
Pre‐fitted SCT model (supports only log_umi
as latent variable). If provided, computes residuals via that model. When
residual.features is NULL, uses the model’s top
variable.features.n; otherwise, sets the assay’s variable features
to residual.features.
Logical; if TRUE (default), stores corrected UMIs in
counts.
Integer; number of cells to subsample when fitting NB regression (default: 5000).
Character vector of genes to compute residuals for. Default NULL (all genes). If set, these become the assay’s variable features.
Integer; when residual.features is NULL,
select this many top features by residual variance (default: 3000).
Numeric; if variable.features.n is NULL,
select features exceeding this residual‐variance threshold (default: 1.3).
Character vector of metadata columns (e.g.
percent.mito) to regress out in a second, non‐regularized model.
Numeric matrix (cells × latent covariates) to regress out.
Logical; if TRUE, scale residuals to unit variance (default: FALSE).
Logical; if TRUE, center residuals to mean zero (default: TRUE).
Numeric vector of length 2; range to clip residuals
(default c(-sqrt(n/30), sqrt(n/30)), with n = number of cells).
Character; if "v2", uses method = "glmGamPoi_offset",
n_cells = 2000, and exclude_poisson = TRUE to fit \(\theta\) and
intercept only.
Logical; if TRUE, never builds the full residual
matrix (slower but memory‐efficient; forces return.only.var.genes=TRUE;
default: FALSE).
Logical; if TRUE (default), scale.data
is subset to variable features only.
Integer; random seed for reproducibility (default: 1448145). Set to NULL to skip setting a seed.
Logical; whether to print progress messages (default: TRUE).
Name of assay to pull the count data from; default is 'RNA'
Name for the new assay containing the normalized data; default is 'SCT'
- A new assay (default name “SCT”), in which:
- counts: depth‐corrected UMI counts (as if each cell had uniform
sequencing depth; controlled by do.correct.umi).
- data: log1p of corrected counts.
- scale.data: Pearson residuals from the fitted NB model (optionally
centered and/or scaled).
- misc: intermediate outputs from sctransform::vst.
When multiple counts layers exist (e.g. after split()),
each layer is modeled independently. A consensus variable‐feature set is
then defined by ranking features by how often they’re called “variable”
across different layers (ties broken by median rank).
By default, sctransform::vst will drop features expressed in fewer
than five cells. In the multi-layer case, this can lead to consenus
variable-features being excluded from the output's scale.data when
a feature is "variable" across many layers but sparsely expressed in at
least one.
vst,
get_residuals,
correct_counts