Learn R Programming

scITD

Introduction
Installation
Walkthrough
Citation

Introduction

Single-Cell Interpretable Tensor Decomposition (scITD) is computational method capable of extracting multicellular gene expression programs that vary across donors or samples. The approach is premised on the idea that higher-level biological processes often involve the coordinated actions and interactions of multiple cell types. Given single-cell expression data from multiple heterogenous samples, scITD aims to detect these joint patterns of dysregulation impacting multiple cell types. This method has a wide range of potential applications, including the study of inter-individual variation at the population-level, patient sub-grouping/stratification, and the analysis of sample-level batch effects. The multicellular information provided by our method allows one to gain a deeper understanding of the ways that cells might be interacting or responding to certain stimuli. To enable such insights, we also provide an integrated suite of downstream data processing tools to transform the scITD output into succinct, yet informative summaries of the data.

Installation

The package has several dependencies from Bioconductor. To install these:

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install(c("ComplexHeatmap", "edgeR", "sva", "Biobase"))

Then, to install scITD from CRAN:

install.packages('scITD')

To install the latest version of scITD from GitHub:

devtools::install_github("kharchenkolab/scITD")

Walkthrough

Follow the walkthrough to learn how to use scITD. The tutorial introduces the standard processing pipeline and applies it to a dataset of PBMC’s from 45 healthy donors.

We also created a tutorial for running ligand-receptor analysis. This uses the same dataset as the main walkthrough.

Citation

If you find scITD useful for your publication, please cite:

Jonathan Mitchel, M. Grace Gordon, Richard K. Perez, Evan Biederstedt, Raymund Bueno, Chun Jimmie Ye, Peter V. Kharchenko (2022). Tensor decomposition reveals coordinated multicellular patterns of transcriptional variation that distinguish and stratify disease individuals. bioRxiv 2022.

Copy Link

Version

Install

install.packages('scITD')

Monthly Downloads

197

Version

1.0.4

License

GPL-3

Maintainer

Jonathan Mitchel

Last Published

September 8th, 2023

Functions in scITD (1.0.4)

determine_ranks_tucker

Run rank determination by svd on the tensor unfolded along each mode

get_callouts_annot

Get gene callout annotations for a loadings heatmap

get_max_correlations

Computes the max correlation between each factor of the decomposition done using the whole dataset to each factor computed using the subsampled/bootstrapped dataset

get_ctype_exp_var

Get explained variance of the reconstructed data using one cell type from one factor

get_meta_associations

Get metadata associations with factor donor scores

get_all_lds_factor_plots

Generate loadings heatmaps for all factors

get_factor_exp_var

Get the explained variance of the reconstructed data using one factor

get_fstats_pvals

Calculate adjusted p-values for gene_celltype fiber-donor score associations

Form the pseudobulk tensor as preparation for running the tensor decomposition.

Get the donor scores and loadings matrix for a single-factor

get_gene_set_vectors

Get logical vectors indicating which genes are in which pathways

Compute gene-factor associations using univariate linear models

get_leading_edge_genes

Get the leading edge genes from GSEA results

Visualize the similarity matrix and the clustering. Adapted from simplifyEnrichment package. https://github.com/jokergoo/simplifyEnrichment/blob/master/R/ht_clusters.R

get_ctype_vargenes

Partition main gene by cell matrix into per cell type matrices with significantly variable genes only. Generally, this should be done through calling the form_tensor() wrapper function.

get_gene_modules

Compute WGCNA gene modules for each cell type

get_min_sig_genes

Evaluate the minimum number for significant genes in any factor for a given number of factors extracted by the decomposition

Identify gene sets that are enriched within specified gene co-regulatory modules. Uses a hypergeometric test for over-representation. Used in plot_multi_module_enr().

identify_sex_metadata

Extract metadata for sex information if not provided already

get_subclust_enr_fig

Get a figure showing cell subtype proportion associations with each factor. Combines this plot with subtype UMAPs and differential expression heatmaps. Note that this function runs better if the number of cores in the conos object in container$embedding has n.cores set to a relatively small value < 10.

get_subclust_enr_hmap

Get heatmap of subtype proportion associations for each celltype/subtype and each factor

normalize_pseudobulk

Normalize the pseudobulked counts matrices. Generally, this should be done through calling the form_tensor() wrapper function.

Get metadata matrix of dimensions donors by variables (not per cell)

get_indv_subtype_associations

Compute subtype proportion-factor association p-values for all subclusters of a given major cell type

Collapse data from cell-level to donor-level via summing counts. Generally, this should be done through calling the form_tensor() wrapper function.

get_normalized_variance

Get normalized variance for each gene, taking into account mean-variance trend

get_num_batch_ranks

Plot factor-batch associations for increasing number of donor factors

get_real_fstats

Get F-Statistics for the real (non-shuffled) gene_ctype fibers

get_subtype_prop_associations

Compute and plot associations between factor scores and cell subtype composition for various clustering resolution parameters

get_subclust_umap

Get a figure to display subclusterings at multiple resolutions

initialize_params

Initialize parameters to be used throughout scITD in various functions

instantiate_scMinimal

Create an scMinimal object. Generally, this should be done through calling the make_new_container() wrapper function.

get_subclusters

Perform leiden subclustering to get cell subtypes

get_one_factor_gene_pvals

Get significant genes for a factor

plot_donor_matrix

Plot matrix of donor scores extracted from Tucker decomposition

plot_donor_props

Plot donor celltype/subtype proportions against each factor

Check if a character is a go ID

make_new_container

Create a container to store all data and results for the project. You must provide a params list as generated by initialize_params(). You also need to provide either a Seurat object or both a count_data matrix and a meta_data matrix.

get_significance_vectors

Get vectors indicating which genes are significant in which cell types for a factor of interest

get_reconstruct_errors_svd

Calculate reconstruction errors using svd approach

get_intersecting_pathways

Extract the intersection of gene sets which are enriched in two or more cell types for a factor

plot_donor_sig_genes

Generate a gene by donor heatmap showing scaled expression of top loading genes for a given factor

plot_dscore_enr

Compute enrichment of donor metadata categorical variables at high/low factor scores

Calculates factor-stratified sums for each column. Adapted from pagoda2. https://github.com/kharchenkolab/pagoda2/blob/main/src/misc2.cpp

plot_scores_by_meta

Plot dotplots for each factor to compare donor scores between metadata groups

plot_subclust_associations

Plot association significances for varying clustering resolutions

get_subclust_de_hmaps

Get list of cell subtype differential expression heatmaps

Computes non-negative matrix factorization on the tensor unfolded along the donor dimension

merge_small_clusts

Merge small subclusters into larger ones

plot_stability_results

Generate a plot for either the donor scores or loadings stability test

parse_data_by_ctypes

Parse main counts matrix into per-celltype-matrices. Generally, this should be done through calling the form_tensor() wrapper function.

plot_multi_module_enr

Generate gene set x ct_module heatmap showing co-expression module gene set enrichment results

reduce_dimensions

Gets a conos object of the data, aligning datasets across a specified variable such as batch or donors. This can be run independently or through get_subtype_prop_associations().

plot_mod_and_lig

Plot trio of associations between ligand expression, module eigengenes, and factor scores

reduce_to_vargenes

Reduce each cell type's expression matrix to just the significantly variable genes. Generally, this should be done through calling the form_tensor() wrapper function.

Helper function from simplifyEnrichment package. https://github.com/jokergoo/simplifyEnrichment/blob/master/R/utils.R

Shuffle elements within the selected fibers

plot_gsea_hmap_w_similarity

Plot already computed enriched gene sets to show semantic similarity between sets

Plot enriched gene sets from all cell types in a heatmap

Look at enriched gene sets from a cluster of semantically similar gene sets. Uses the results from previous run of plot_gsea_hmap_w_similarity()

plot_select_sets

Plot enrichment results for hand picked gene sets

get_subclust_enr_dotplot

Get scatter plot for association of a cell subtype proportion with scores for a factor

Computes singular-value decomposition on the tensor unfolded along the donor dimension

normalize_counts

Helper function to normalize and log-transform count data

norm_var_helper

Calculates the normalized variance for each gene. This is adapted from pagoda2. https://github.com/kharchenkolab/pagoda2/blob/main/R/Pagoda2.R Generally, this should be done through calling the form_tensor() wrapper function.

Run fgsea for one cell type of one factor

plot_loadings_annot

Plot the gene by celltype loadings for a factor

run_hypergeometric_gsea

Compute enriched gene sets among significant genes in a cell type for a factor using hypergeometric test

project_new_data

Project multicellular patterns to get scores on new data

prep_LR_interact

Prepare data for LR analysis and get soft thresholds to use for gene modules

Run jackstraw to get genes that are significantly associated with donor scores for factors extracted by Tucker decomposition

plotDEheatmap_conos

Plot a heatmap of differential genes. Code is adapted from Conos package. https://github.com/kharchenkolab/conos/blob/master/R/plot.R

run_stability_analysis

Test stability of a decomposition by subsampling or bootstrapping donors. Note that running this function will replace the decomposition in the project container with one resulting from the tucker parameters entered here.

run_gsea_one_factor

Run gsea separately for all cell types of one specified factor and plot results

subset_scMinimal

Subset an scMinimal object by specified genes, donors, cells, or cell types

Run the Tucker decomposition and rotate the factors

Update any of the experiment-wide parameters

Compute significantly variable genes via anova. Generally, this should be done through calling the form_tensor() wrapper function.

Data container for testing tensor formation steps

tucker_ica_helper

Helper function for running the decomposition. Use the run_tucker_ica() wrapper function instead.

Scale font size. From simplifyEnrichment package. https://github.com/jokergoo/simplifyEnrichment/blob/master/R/ht_clusters.R

Get a list of tensor fibers to shuffle

plot_rec_errors_line_svd

Plot reconstruction errors as line plot for svd method

reshape_loadings

Reshape loadings for a factor from linearized to matrix form

render_multi_plots

Create a figure of all loadings plots arranged

plot_rec_errors_bar_svd

Plot reconstruction errors as bar plot for svd method

seurat_to_scMinimal

Convert Seurat object to scMinimal object. Generally, this should be done through calling the make_new_container() wrapper function.

Scale variance across donors for each gene within each cell type. Generally, this should be done through calling the form_tensor() wrapper function.

Create the tensor object by stacking each pseudobulk cell type matrix. Generally, this should be done through calling the form_tensor() wrapper function.

compute_LR_interact

Compute and plot the LR interactions for one factor

calculate_fiber_fstats

Calculate F-Statistics for the association between donor scores for each factor donor values of shuffled gene_ctype fibers

compute_donor_props

Get donor proportions of each cell type or subtype

Helper function to check whether receptor is present in target cell type

Apply ComBat batch correction to pseudobulk matrices. Generally, this should be done through calling the form_tensor() wrapper function.

compute_associations

Compute associations between donor proportions and factor scores

compare_decompositions

Plot a pairwise comparison of factors from two separate decompositions

Calculates column mean and variance. Adapted from pagoda2. https://github.com/kharchenkolab/pagoda2/blob/main/src/misc2.cpp

count_word. From older version of simplifyEnrichment package.

Clean data to remove genes only expressed in a few cells and donors with very few cells. Generally, this should be done through calling the form_tensor() wrapper function.

Convert gene identifiers to gene symbols

get_ctype_prop_associations

Compute and plot associations between donor factor scores and donor proportions of major cell types

get_ctype_subc_prop_associations

Compute and plot associations between donor factor scores and donor proportions of cell subtypes