Learn R Programming

monocle3 (version 0.1.3)

preprocess_cds: Preprocess a cds to prepare for trajectory inference

Description

Most analyses (including trajectory inference, and clustering) in Monocle3, require various normalization and preprocessing steps. preprocess_cds executes and stores these preprocessing steps.

Specifically, depending on the options selected, preprocess_cds first normalizes the data by log and size factor to address depth differences, or by size factor only. Next, preprocess_cds calculates a lower dimensional space that will be used as the input for further dimensionality reduction like tSNE and UMAP.

Usage

preprocess_cds(cds, method = c("PCA", "LSI"), num_dim = 50,
  norm_method = c("log", "size_only"), use_genes = NULL,
  residual_model_formula_str = NULL, pseudo_count = NULL,
  scaling = TRUE, verbose = FALSE, ...)

Arguments

cds

the cell_data_set upon which to perform this operation

method

a string specifying the initial dimension method to use, currently either PCA or LSI. For LSI (latent semantic indexing), it converts the (sparse) expression matrix into tf-idf matrix and then performs SVD to decompose the gene expression / cells into certain modules / topics. Default is "PCA".

num_dim

the dimensionality of the reduced space.

norm_method

Determines how to transform expression values prior to reducing dimensionality. Options are "log" and "size_only". Default is "log".

use_genes

NULL or a list of gene IDs. If a list of gene IDs, only this subset of genes is used for dimensionality reduction. Default is NULL.

residual_model_formula_str

NULL or a string model formula specifying any effects to subtract from the data before dimensionality reduction. Default is NULL.

pseudo_count

NULL or the amount to increase expression values before normalization and dimensionality reduction. If NULL (default), a pseudo_count of 1 is added for log normalization and 0 is added for size factor only normalization.

scaling

When this argument is set to TRUE (default), it will scale each gene before running trajectory reconstruction. Relevant for method = PCA only.

verbose

Whether to emit verbose output during dimensionality reduction

...

additional arguments to pass to limma::lmFit if residual_model_formula is not NULL

Value

an updated cell_data_set object