Learn R Programming

digitalDLSorteR

The digitalDLSorteR R package provides a set of tools to deconvolute cell type proportions of bulk RNA-seq data through the development of context-specific deconvolution models based on single-cell RNA-seq (scRNA-seq) data. These models are able to accurately estimate cell type proportions of bulk RNA-seq samples from specific biological environments. For more details about the algorithm and the functionalities implemented in this package, see Torroja and Sanchez-Cabo, 2019, Mañanes et al., 2024, and https://diegommcc.github.io/digitalDLSorteR/.

Installation

digitalDLSorteR is available on CRAN and can be installed as follows:

install.packages("digitalDLSorteR")

The version under development is available on GitHub:

if (!requireNamespace("remotes", quietly = TRUE))
    install.packages("remotes")
remotes::install_github("diegommcc/digitalDLSorteR")

The package depends on the tensorflow R package, so a working Python interpreter with the Tensorflow Python library installed is needed. The installTFpython function provides an easy way to install a conda environment called digitaldlsorter-env with all necessary dependencies covered. We recommend installing the TensorFlow Python library in this way, although a custom installation is possible. See the Keras/TensorFlow installation and configuration article of the package website for more details.

library("digitalDLSorteR")
installTFpython(install.conda = TRUE)

Rationale of digitalDLSorteR

The algorithm consists of training Deep Neural Network (DNN) models with simulated bulk RNA-seq samples whose cell composition is known. These pseudo-bulk RNA-seq samples are generated by aggregating pre-characterized scRNA-seq data from specific biological environments. These models are able to accurately deconvolute new bulk RNA-seq samples from the same environment, as they are able to account for possible environmental-dependent transcriptional changes of specific cells, such as immune cells in complex diseases (e.g., specific subtypes of cancer or atherosclerosis). This aspect overcomes this limitation present in other methods. For instance, in the case of immune cells, published methods often rely on purified transcriptional profiles from peripheral blood mononuclear cells despite the fact that these cells are highly variable depending on environmental conditions. Thus, this feature together with the fact that scRNA-seq datasets improve over time (the more cells, the more variability learnt by the models) will lead to build more accurate and comprehensive models.

Usage

The package has two main ways of use:

  1. Using pre-trained models included in the digitalLDSorteRmodels (https://github.com/diegommcc/digitalDLSorteRmodels) R package to deconvolute new bulk RNA-seq samples from the same environment. So far, the available models allow to deconvolute samples from human breast cancer (GSE75688 data from Chung et al., 2017 used as reference), and colorectal cancer (GSE132465, GSE132257 and GSE144735 data from Lee, Hong, Etlioglu Cho et al., 2020 used as reference). For more details about this workflow, please see the Using pre-trained context-specific deconvolution models article. Disclaimer: these models intend to be a quick option to deconvolute samples from the same biological environment, but we strongly recommend generating new models with data manually curated by the users.
  2. Building new deconvolution models from pre-characterized scRNA-seq datasets. This is the main way to use digitalDLSorteR. For more information on the workflow, see the article Building new deconvolution models.

To use pre-trained context specific deconvolution models, digitalDLSorteR relies on the digitalDLSorteRmodels data R package. Therefore, it should be installed along with digitalDLSorteR from GitHub as follows:

remotes::install_github("diegommcc/digitalDLSorteRmodels")

Once digitalDLSorteRmodels is loaded, the pre-trained models are available. See the article Using pre-trained context-specific deconvolution models for an example.

Final remarks

  • Regarding pre-trained models, if you generate new models and want to make them available through the digitalDLSorteRmodels R package for other users to use them, contact with us!
  • We provide some pre-trained models that take into account genes that seem to be relevant for these environmental conditions. However, as these genes might be different depending on the bulk RNA-seq to be deconvoluted, we strongly recommend creating new models through the workflow explained here.
  • Contributions and suggestions are welcome!

Citation

Please, if you use digitalDLSorteR in your research, cite Torroja and Sanchez-Cabo, 2019 (first description of the algorithm) and Mañanes et al., 2024 (version for spatial transcriptomics data whose development has served to improve digitalDLSorteR as well).

References

Copy Link

Version

Install

install.packages('digitalDLSorteR')

Monthly Downloads

55

Version

1.1.0

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Diego Ma<c3><b1>anes

Last Published

September 13th, 2024

Functions in digitalDLSorteR (1.1.0)

corrExpPredPlot

Generate correlation plots between predicted and expected cell type proportions from test data
cell.types

Get and set cell.types slot in a DigitalDLSorterDNN object
deconv.data

Get and set deconv.data slot in a DigitalDLSorter object
deconvDDLSPretrained

Deconvolute bulk RNA-Seq samples using a pre-trained DigitalDLSorter model
createDDLSobject

Create a DigitalDLSorter object from single-cell RNA-seq and bulk RNA-seq data
estimateZinbwaveParams

Estimate the parameters of the ZINB-WaVE model to simulate new single-cell RNA-Seq expression profiles
distErrorPlot

Generate box plots or violin plots to show how the errors are distributed
deconv.results

Get and set deconv.results slot in a DigitalDLSorter object
digitalDLSorteR

digitalDLSorteR: an R package to deconvolute bulk RNA-Seq samples using single-cell RNA-seq data and neural networks
deconvDDLSObj

Deconvolute bulk gene expression samples (bulk RNA-Seq)
getProbMatrix

Getter function for the cell composition matrix
features

Get and set features slot in a DigitalDLSorterDNN object
loadTrainedModelFromH5

Load from an HDF5 file a trained Deep Neural Network model into a DigitalDLSorter object
installTFpython

Install Python dependencies for digitalDLSorteR
listToDDLSDNN

Transform DigitalDLSorterDNN-like list into an actual DigitalDLSorterDNN object
method

Get and set method slot in a ProbMatrixCellTypes object
loadDeconvData

Load data to be deconvoluted into a DigitalDLSorter object
generateBulkCellMatrix

Generate training and test cell composition matrices
interGradientsDL

Calculate gradients of predicted cell types/loss function with respect to input features for interpreting trained deconvolution models
listToDDLS

Transform DigitalDLSorter-like list into an actual DigitalDLSorterDNN object
model

Get and set model slot in a DigitalDLSorterDNN object
plotHeatmapGradsAgg

Plot a heatmap of gradients of classes / loss function wtih respect to the input
saveTrainedModelAsH5

Save a trained DigitalDLSorter Deep Neural Network model to disk as an HDF5 file
saveRDS

Save DigitalDLSorter objects as RDS files
prob.matrix

Get and set prob.matrix slot in a ProbMatrixCellTypes object
project

Get and set project slot in a DigitalDLSorter object
prob.cell.types

Get and set prob.cell.types slot in a DigitalDLSorter object
preparingToSave

Prepare DigitalDLSorter object to be saved as an RDA file
simSCProfiles

Simulate new single-cell RNA-Seq expression profiles using the ZINB-WaVE model parameters
single.cell.real

Get and set single.cell.real slot in a DigitalDLSorter object
test.pred

Get and set test.pred slot in a DigitalDLSorterDNN object
test.metrics

Get and set test.metrics slot in a DigitalDLSorterDNN object
zinb.params

Get and set zinb.params slot in a DigitalDLSorter object
showProbPlot

Show distribution plots of the cell proportions generated by generateBulkCellMatrix
plots

Get and set plots slot in a ProbMatrixCellTypes object
plotTrainingHistory

Plot training history of a trained DigitalDLSorter Deep Neural Network model
simBulkProfiles

Simulate training and test pseudo-bulk RNA-Seq profiles
trainDDLSModel

Train Deep Neural Network model
training.history

Get and set training.history slot in a DigitalDLSorterDNN object
trained.model

Get and set trained.model slot in a DigitalDLSorter object
set.list

Get and set set.list slot in a ProbMatrixCellTypes object
set

Get and set set slot in a ProbMatrixCellTypes object
zinbwave.model

Get and set zinbwave.model slot in a ZinbParametersModel object
topGradientsCellType

Get top genes with largest/smallest gradients per cell type
test.deconv.metrics

Get and set test.deconv.metrics slot in a DigitalDLSorterDNN object
single.cell.simul

Get and set single.cell.simul slot in a DigitalDLSorter object
DigitalDLSorterDNN-class

The DigitalDLSorterDNN Class
calculateEvalMetrics

Calculate evaluation metrics for bulk RNA-Seq samples from test data
cell.names

Get and set cell.names slot in a ProbMatrixCellTypes object
DigitalDLSorter-class

The DigitalDLSorter Class
ProbMatrixCellTypes-class

The Class ProbMatrixCellTypes
blandAltmanLehPlot

Generate Bland-Altman agreement plots between predicted and expected cell type proportions from test data results
barPlotCellTypes

Bar plot of deconvoluted cell type proportions in bulk RNA-Seq samples
bulk.simul

Get and set bulk.simul slot in a DigitalDLSorter object
ZinbParametersModel-class

The Class ZinbParametersModel
barErrorPlot

Generate bar error plots