Learn R Programming

scITD

Introduction

Single-Cell Interpretable Tensor Decomposition (scITD) is computational method capable of extracting multicellular gene expression programs that vary across donors or samples. The approach is premised on the idea that higher-level biological processes often involve the coordinated actions and interactions of multiple cell types. Given single-cell expression data from multiple heterogenous samples, scITD aims to detect these joint patterns of dysregulation impacting multiple cell types. This method has a wide range of potential applications, including the study of inter-individual variation at the population-level, patient sub-grouping/stratification, and the analysis of sample-level batch effects. The multicellular information provided by our method allows one to gain a deeper understanding of the ways that cells might be interacting or responding to certain stimuli. To enable such insights, we also provide an integrated suite of downstream data processing tools to transform the scITD output into succinct, yet informative summaries of the data.

Installation

The package has several dependencies from Bioconductor. To install these:

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install(c("ComplexHeatmap", "edgeR", "sva", "Biobase"))

Then, to install scITD from CRAN:

install.packages('scITD')

To install the latest version of scITD from GitHub:

devtools::install_github("kharchenkolab/scITD")

Walkthrough

Follow the walkthrough to learn how to use scITD. The tutorial introduces the standard processing pipeline and applies it to a dataset of PBMC’s from 45 healthy donors.

We also created a tutorial for running ligand-receptor analysis. This uses the same dataset as the main walkthrough.

Citation

If you find scITD useful for your publication, please cite:

Jonathan Mitchel, M. Grace Gordon, Richard K. Perez, Evan Biederstedt, Raymund Bueno, Chun Jimmie Ye, Peter V. Kharchenko (2022). Tensor decomposition reveals coordinated multicellular patterns of transcriptional variation that distinguish and stratify disease individuals. bioRxiv 2022.

Copy Link

Version

Install

install.packages('scITD')

Monthly Downloads

197

Version

1.0.4

License

GPL-3

Maintainer

Jonathan Mitchel

Last Published

September 8th, 2023

Functions in scITD (1.0.4)

determine_ranks_tucker

Run rank determination by svd on the tensor unfolded along each mode
get_callouts_annot

Get gene callout annotations for a loadings heatmap
get_max_correlations

Computes the max correlation between each factor of the decomposition done using the whole dataset to each factor computed using the subsampled/bootstrapped dataset
get_ctype_exp_var

Get explained variance of the reconstructed data using one cell type from one factor
get_meta_associations

Get metadata associations with factor donor scores
get_all_lds_factor_plots

Generate loadings heatmaps for all factors
get_factor_exp_var

Get the explained variance of the reconstructed data using one factor
get_fstats_pvals

Calculate adjusted p-values for gene_celltype fiber-donor score associations
form_tensor

Form the pseudobulk tensor as preparation for running the tensor decomposition.
get_one_factor

Get the donor scores and loadings matrix for a single-factor
get_gene_set_vectors

Get logical vectors indicating which genes are in which pathways
get_lm_pvals

Compute gene-factor associations using univariate linear models
get_leading_edge_genes

Get the leading edge genes from GSEA results
ht_clusters

Visualize the similarity matrix and the clustering. Adapted from simplifyEnrichment package. https://github.com/jokergoo/simplifyEnrichment/blob/master/R/ht_clusters.R
get_ctype_vargenes

Partition main gene by cell matrix into per cell type matrices with significantly variable genes only. Generally, this should be done through calling the form_tensor() wrapper function.
get_gene_modules

Compute WGCNA gene modules for each cell type
get_min_sig_genes

Evaluate the minimum number for significant genes in any factor for a given number of factors extracted by the decomposition
get_module_enr

Identify gene sets that are enriched within specified gene co-regulatory modules. Uses a hypergeometric test for over-representation. Used in plot_multi_module_enr().
identify_sex_metadata

Extract metadata for sex information if not provided already
get_subclust_enr_fig

Get a figure showing cell subtype proportion associations with each factor. Combines this plot with subtype UMAPs and differential expression heatmaps. Note that this function runs better if the number of cores in the conos object in container$embedding has n.cores set to a relatively small value < 10.
get_subclust_enr_hmap

Get heatmap of subtype proportion associations for each celltype/subtype and each factor
normalize_pseudobulk

Normalize the pseudobulked counts matrices. Generally, this should be done through calling the form_tensor() wrapper function.
get_donor_meta

Get metadata matrix of dimensions donors by variables (not per cell)
get_indv_subtype_associations

Compute subtype proportion-factor association p-values for all subclusters of a given major cell type
get_pseudobulk

Collapse data from cell-level to donor-level via summing counts. Generally, this should be done through calling the form_tensor() wrapper function.
get_normalized_variance

Get normalized variance for each gene, taking into account mean-variance trend
get_num_batch_ranks

Plot factor-batch associations for increasing number of donor factors
get_real_fstats

Get F-Statistics for the real (non-shuffled) gene_ctype fibers
get_subtype_prop_associations

Compute and plot associations between factor scores and cell subtype composition for various clustering resolution parameters
get_subclust_umap

Get a figure to display subclusterings at multiple resolutions
initialize_params

Initialize parameters to be used throughout scITD in various functions
instantiate_scMinimal

Create an scMinimal object. Generally, this should be done through calling the make_new_container() wrapper function.
get_subclusters

Perform leiden subclustering to get cell subtypes
get_one_factor_gene_pvals

Get significant genes for a factor
plot_donor_matrix

Plot matrix of donor scores extracted from Tucker decomposition
plot_donor_props

Plot donor celltype/subtype proportions against each factor
is_GO_id

Check if a character is a go ID
make_new_container

Create a container to store all data and results for the project. You must provide a params list as generated by initialize_params(). You also need to provide either a Seurat object or both a count_data matrix and a meta_data matrix.
get_significance_vectors

Get vectors indicating which genes are significant in which cell types for a factor of interest
get_reconstruct_errors_svd

Calculate reconstruction errors using svd approach
get_intersecting_pathways

Extract the intersection of gene sets which are enriched in two or more cell types for a factor
plot_donor_sig_genes

Generate a gene by donor heatmap showing scaled expression of top loading genes for a given factor
plot_dscore_enr

Compute enrichment of donor metadata categorical variables at high/low factor scores
get_sums

Calculates factor-stratified sums for each column. Adapted from pagoda2. https://github.com/kharchenkolab/pagoda2/blob/main/src/misc2.cpp
plot_scores_by_meta

Plot dotplots for each factor to compare donor scores between metadata groups
plot_subclust_associations

Plot association significances for varying clustering resolutions
get_subclust_de_hmaps

Get list of cell subtype differential expression heatmaps
nmf_unfolded

Computes non-negative matrix factorization on the tensor unfolded along the donor dimension
merge_small_clusts

Merge small subclusters into larger ones
plot_stability_results

Generate a plot for either the donor scores or loadings stability test
parse_data_by_ctypes

Parse main counts matrix into per-celltype-matrices. Generally, this should be done through calling the form_tensor() wrapper function.
plot_multi_module_enr

Generate gene set x ct_module heatmap showing co-expression module gene set enrichment results
reduce_dimensions

Gets a conos object of the data, aligning datasets across a specified variable such as batch or donors. This can be run independently or through get_subtype_prop_associations().
plot_mod_and_lig

Plot trio of associations between ligand expression, module eigengenes, and factor scores
reduce_to_vargenes

Reduce each cell type's expression matrix to just the significantly variable genes. Generally, this should be done through calling the form_tensor() wrapper function.
stop_wrap

Helper function from simplifyEnrichment package. https://github.com/jokergoo/simplifyEnrichment/blob/master/R/utils.R
shuffle_fibers

Shuffle elements within the selected fibers
plot_gsea_hmap_w_similarity

Plot already computed enriched gene sets to show semantic similarity between sets
plot_gsea_hmap

Plot enriched gene sets from all cell types in a heatmap
plot_gsea_sub

Look at enriched gene sets from a cluster of semantically similar gene sets. Uses the results from previous run of plot_gsea_hmap_w_similarity()
plot_select_sets

Plot enrichment results for hand picked gene sets
get_subclust_enr_dotplot

Get scatter plot for association of a cell subtype proportion with scores for a factor
pca_unfolded

Computes singular-value decomposition on the tensor unfolded along the donor dimension
normalize_counts

Helper function to normalize and log-transform count data
norm_var_helper

Calculates the normalized variance for each gene. This is adapted from pagoda2. https://github.com/kharchenkolab/pagoda2/blob/main/R/Pagoda2.R Generally, this should be done through calling the form_tensor() wrapper function.
run_fgsea

Run fgsea for one cell type of one factor
plot_loadings_annot

Plot the gene by celltype loadings for a factor
run_hypergeometric_gsea

Compute enriched gene sets among significant genes in a cell type for a factor using hypergeometric test
project_new_data

Project multicellular patterns to get scores on new data
prep_LR_interact

Prepare data for LR analysis and get soft thresholds to use for gene modules
run_jackstraw

Run jackstraw to get genes that are significantly associated with donor scores for factors extracted by Tucker decomposition
plotDEheatmap_conos

Plot a heatmap of differential genes. Code is adapted from Conos package. https://github.com/kharchenkolab/conos/blob/master/R/plot.R
run_stability_analysis

Test stability of a decomposition by subsampling or bootstrapping donors. Note that running this function will replace the decomposition in the project container with one resulting from the tucker parameters entered here.
run_gsea_one_factor

Run gsea separately for all cell types of one specified factor and plot results
subset_scMinimal

Subset an scMinimal object by specified genes, donors, cells, or cell types
run_tucker_ica

Run the Tucker decomposition and rotate the factors
update_params

Update any of the experiment-wide parameters
vargenes_anova

Compute significantly variable genes via anova. Generally, this should be done through calling the form_tensor() wrapper function.
test_container

Data container for testing tensor formation steps
tucker_ica_helper

Helper function for running the decomposition. Use the run_tucker_ica() wrapper function instead.
scale_fontsize

Scale font size. From simplifyEnrichment package. https://github.com/jokergoo/simplifyEnrichment/blob/master/R/ht_clusters.R
sample_fibers

Get a list of tensor fibers to shuffle
plot_rec_errors_line_svd

Plot reconstruction errors as line plot for svd method
reshape_loadings

Reshape loadings for a factor from linearized to matrix form
render_multi_plots

Create a figure of all loadings plots arranged
plot_rec_errors_bar_svd

Plot reconstruction errors as bar plot for svd method
seurat_to_scMinimal

Convert Seurat object to scMinimal object. Generally, this should be done through calling the make_new_container() wrapper function.
scale_variance

Scale variance across donors for each gene within each cell type. Generally, this should be done through calling the form_tensor() wrapper function.
stack_tensor

Create the tensor object by stacking each pseudobulk cell type matrix. Generally, this should be done through calling the form_tensor() wrapper function.
compute_LR_interact

Compute and plot the LR interactions for one factor
calculate_fiber_fstats

Calculate F-Statistics for the association between donor scores for each factor donor values of shuffled gene_ctype fibers
compute_donor_props

Get donor proportions of each cell type or subtype
check_rec_pres

Helper function to check whether receptor is present in target cell type
apply_combat

Apply ComBat batch correction to pseudobulk matrices. Generally, this should be done through calling the form_tensor() wrapper function.
compute_associations

Compute associations between donor proportions and factor scores
compare_decompositions

Plot a pairwise comparison of factors from two separate decompositions
colMeanVars

Calculates column mean and variance. Adapted from pagoda2. https://github.com/kharchenkolab/pagoda2/blob/main/src/misc2.cpp
count_word

count_word. From older version of simplifyEnrichment package.
clean_data

Clean data to remove genes only expressed in a few cells and donors with very few cells. Generally, this should be done through calling the form_tensor() wrapper function.
convert_gn

Convert gene identifiers to gene symbols
get_ctype_prop_associations

Compute and plot associations between donor factor scores and donor proportions of major cell types
get_ctype_subc_prop_associations

Compute and plot associations between donor factor scores and donor proportions of cell subtypes