Table of Contents
What is SignacX?
SignacX is software developed by the Savova lab at Sanofi with a focus on single cell genomics for clinical applications. SignacX classifies the cellular phenotype for each individual cell in single cell RNA-sequencing data using neural networks trained with sorted bulk gene expression data from the Human Primary Cell Atlas. In this R implementation, we provide functions and vignettes that demonstrate how to: integrate single cell data (mapping cells from one data set to another), classify non-human data, identify novel cell types, and classify single cell data across many tissues, diseases and technologies. To learn more, check out the pre-print here.
Data portal
Here, we provide interactive access to data from the pre-print with SPRING Viewer. Just click the "Explore" links below, and search your favorite gene:
| Links | Tissue | Disease | Number of cells | Number of samples | Source | Signac version | 
|---|---|---|---|---|---|---|
| Explore | Kidney | Cancer | 48,037 | 47 | Stewart et al. 2019 | v2.0.7 | 
| Explore | Kidney and urine | Lupus nephritis and healthy | 5,886 | 39 | Arazi et al. 2019 | v2.0.7 | 
| Explore | Lung | Cancer | 42,844 | 18 | Zilionis et al. 2020 | v2.0.7 | 
| Explore | Lung | Fibrosis | 96,461 | 31 | Habermann et al. 2020 | v2.0.7 | 
| Explore | Lung | Fibrosis | 109,421 | 16 | Reyfman et al. 2019 | v2.0.7 | 
| Explore | Monkey PBMCs | Healthy | 5,491 | 1 | Chamberlain et al. 2021 | v2.0.7 | 
| Explore | Monkey PBMCs | Healthy | 5,220 | 1 | Chamberlain et al. 2021 | v2.0.7 | 
| Explore | Monkey T cells | Healthy | 5,496 | 1 | Chamberlain et al. 2021 | v2.0.7 | 
| Explore | PBMCs | Cancer | 14,048 | 8 | Zilionis et al. 2020 | v2.0.7 | 
| Explore | PBMCs | Healthy | 7,902 | 1 | 10X Genomics | v2.0.7 | 
| Explore | PBMCs | Healthy | 4,784 | 1 | 10X Genomics | v2.0.7 | 
| Explore | Skin | Atopic dermatitis | 36,690 | 17 | He et al. 2020 | v2.0.7 | 
| Explore | Synovium | Rheumatoid arthritis and osteoarthritis | 8,920 | 26 | Zhang et. al 2019 | v2.0.7 | 
Note:
- Cell type annotations are provided at four levels (immune, celltypes, cellstates and novel celltypes).
- When available, we also provided information about sample covariates (i.e., disease, age, gender, FACs etc.).
- Cell type annotations for all 13 data sets were generated with the Signac function with the default settings without changing any settings or parameters.
Special thanks to Allon Klein's lab (particularly Caleb Weinreb and Sam Wolock) for hosting the data.
Getting Started
To install SignacX in R, simply do:
Installation
install.packages("SignacX")Quick start
The main functions in Signac are:
# load the library
library(SignacX)
# Generate initial labels
labels = Signac(E = your_data_here)
# Get cell type labels
celltypes = GenerateLabels(labels, E = your_data_here)Sometimes we don't have time to run Signac, and need a quick solution. Although Signac scales fine with large data sets (>300,000 cells), we developed SignacFast to quickly classify single cell data:
# load the library
library(SignacX)
# generate labels with pre-trained model
labels_fast <- SignacFast(E = your_data_here, num.cores = 4)
celltypes_fast = GenerateLabels(labels_fast, E = your_data_here)Usage
To make life easier, SignacX was integrated with Seurat (versions 3 and 4), and with SPRING. We provide a few vignettes:
SPRING
In the pre-print, we often used Signac integrated with SPRING. To reproduce our findings and to generate new results with SPRING, please visit the SPRING repository which has example notebooks and installation instructions, particularly for processing CITE-seq and scRNA-seq data from 10X Genomics. Briefly, Signac is integrated seamlessly with the output files of SPRING in R, requiring only a few functions:
# load the Signac library
library(SignacX)
# dir points to the "FullDataset_v1" directory generated by the SPRING Jupyter notebook
dir = "./FullDataset_v1" 
# load the expression data
E = CID.LoadData(dir)
# generate cellular phenotype labels
labels = Signac(E, spring.dir = dir)
celltypes = GenerateLabels(labels, E = E, spring.dir = dir)
# write cell types and Louvain clusters to SPRING
dat <- CID.writeJSON(celltypes, spring.dir = dir)After running the above functions, cellular phenotypes and Louvain clusters are ready to be visualized with SPRING Viewer, which can be setup locally as described here.
Seurat
Another way to use Signac is with Seurat. In this vignette, we performed multi-modal analysis of CITE-seq PBMCs from 10X Genomics using Signac integrated with Seurat.
Note:
- This same data set was also processed using SPRING in this notebook, and subsequently classified with Signac, which was used to generate SPRING layouts for these data in the pre-print (Figures 2-4), which is available for interactive exploration here.
MASC
Sometimes, we have single cell genomics data with disease information, and we want to know which cellular phenotypes are enriched for disease. In this vignette, we applied Signac to classify cellular phenotypes in healthy and lupus nephritis kidney cells, and then we used MASC to identify which cellular phenotypes were disease-enriched.
Note:
- MASC typically requires equal numbers of cells and samples between case and control: an unequal number might skew the clustering of cells towards one sample (i.e., a "batch effect"), which could cause spurious disease enrichment in the mixed effect model. Since Signac classifies each cell independently (without using clusters), Signac annotations can be used with MASC without a priori balancing samples or cells, unlike cluster-based annotation methods.
Non-human data
In Supplemental Figure 8 of the pre-print, we classified single cell data for a model organism (cynomolgus monkey) for which flow-sorted datasets were generally lacking without any additional species-specific training. Instead, we mapped homologous genes from the Macaca fascicularis genome to the human genome in the single cell data, and then performed cell type classification with Signac. We demonstrate how we mapped the gene symbols here.
Note:
- This code can be used for to identify homologous genes between any two species.
- Monkey data used in Supplemental Figure 8 are available for interactive exploration in the table listed above.
Genes of interest
In Figure 6 of the pre-print, we compiled data from three source (CellPhoneDB, GWAS catalog and Fang et al. 2020) to find genes of immunological / pharmacological interest. These genes and their annotations can be accessed internally from within Signac:
# load the library
library(SignacX)
# See ?Genes_Of_Interest
data("Genes_Of_Interest")Learning from single cell data
In Figure 4 of the pre-print, we demonstrated that Signac mapped cell type labels from one single cell data set to another; learning CD56bright NK cells from CITE-seq data. Here, we provide a vignette for reproducing this analysis, which can be used to map cell populations (or clusters of cells) from one data set to another. We also provide interactive access to the single cell data that were annotated with the CD56bright NK cell-model (Note: the CD56bright NK cells appear in the "CellStates" annotation layer as red cells).
| Links | Tissue | Disease | Number of cells | Number of samples | Source | Signac version | 
|---|---|---|---|---|---|---|
| Explore | Kidney | Cancer | 48,037 | 47 | Stewart et al. 2019 | v2.0.7 + CD56bright NK | 
| Explore | Kidney and urine | Lupus nephritis and healthy | 5,886 | 39 | Arazi et al. 2019 | v2.0.7 + CD56bright NK | 
| Explore | Lung | Cancer | 42,844 | 18 | Zilionis et al. 2020 | v2.0.7 + CD56bright NK | 
| Explore | Lung | Fibrosis | 96,461 | 31 | Habermann et al. 2020 | v2.0.7 + CD56bright NK | 
| Explore | Lung | Fibrosis | 109,421 | 16 | Reyfman et al. 2019 | v2.0.7 + CD56bright NK | 
| Explore | Monkey PBMCs | Healthy | 5,491 | 1 | Chamberlain et al. 2021 | v2.0.7 + CD56bright NK | 
| Explore | Monkey PBMCs | Healthy | 5,220 | 1 | Chamberlain et al. 2021 | v2.0.7 + CD56bright NK | 
| Explore | Monkey T cells | Healthy | 5,496 | 1 | Chamberlain et al. 2021 | v2.0.7 + CD56bright NK | 
| Explore | PBMCs | Cancer | 14,048 | 8 | Zilionis et al. 2020 | v2.0.7 + CD56bright NK | 
| Explore | PBMCs | Healthy | 4,784 | 1 | 10X Genomics | v2.0.7 + CD56bright NK | 
| Explore | Skin | Atopic dermatitis | 36,690 | 17 | He et al. 2020 | v2.0.7 + CD56bright NK | 
| Explore | Synovium | Rheumatoid arthritis and osteoarthritis | 8,920 | 26 | Zhang et. al 2019 | v2.0.7 + CD56bright NK | 
Fast Signac
Sometimes we don't have time to run Signac and need a faster solution. Although Signac scales fine with large data sets (>300,000 cells) and even for large data, typically takes less than an hour, we developed SignacFast to quickly classify single cell data:
# load the library
library(SignacX)
# generate labels with pre-trained model
labels_fast <- SignacFast(E = your_data_here, num.cores = 4)
celltypes_fast = GenerateLabels(labels_fast, E = your_data_here)Unlike Signac, SignacFast uses a pre-trained ensemble of neural network models generated from the HPCA reference data, speeding classsification time ~5-10x fold. These models were generated from the HPCA training data like so:
# load the library
library(SignacX)
# load pre-trained neural network ensemble model
ref = GetTrainingData_HPCA()
# generate models
Models_HPCA = ModelGenerator(R = training_HPCA, N = 100, num.cores = 4)The "Models_HPCA" are accessed from within the R package:
# load the library
library(SignacX)
# load pre-trained neural network ensemble model
Models = GetModels_HPCA()We demonstrate how to use SignacFast in this vignette, which shows that the results are broadly consistent with running Signac.
Note:
- For proper use; if the concern is only major cell types (i.e., TNK and MPh cells), then SignacFast is a fine alternative to Signac.
Benchmarking
CITE-seq
In Figure 2-3 of the pre-print, we validated Signac with CITE-seq PBMCs. Here, we reproduced that analysis with SPRING (in this vignette; as was performed in the pre-print) and additionally with Seurat (in this vignette), and provide interactive access to the data here.
Flow-sorted synovial cells
In Figure 3 of the pre-print, we validated Signac with flow cytometry and compared Signac to SingleR. We reproduced that analysis using Seurat in this vignette, and provide interactive access to the data here.
PBMCs
In Table 1 of the pre-print, we benchmarked Signac across seven different technologies: CEL-seq, Drop-Seq, inDrop, 10X (v2), 10X (v3), Seq-Well and Smart-Seq2; this analysis was reproduced here.
Roadmap
See the open issues for a list of proposed features (and known issues).
Contributing
Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (git checkout -b feature/AmazingFeature)
- Commit your Changes (git commit -m 'Add some AmazingFeature')
- Push to the Branch (git push origin feature/AmazingFeature)
- Open a Pull Request
You can also open a pull request to commit to the master branch.
License
Distributed under the GPL v3.0 License. See LICENSE for more information.
Contact
Mathew Chamberlain - chamberlainphd@gmail.com
Project Link: https://github.com/mathewchamberlain/SignacX