Learn R Programming

celda: CEllular Latent Dirichlet Allocation

"celda" stands for "CEllular Latent Dirichlet Allocation", which is a suite of Bayesian hierarchical models and supporting functions to perform gene and cell clustering for count data generated by single cell RNA-seq platforms. This algorithm is an extension of the Latent Dirichlet Allocation (LDA) topic modeling framework that has been popular in text mining applications. Celda has advantages over other clustering frameworks:

  1. Celda can simultaneously cluster genes into transcriptional states and cells into subpopulations
  2. Celda uses count-based Dirichlet-multinomial distributions so no additional normalization is required for 3' DGE single cell RNA-seq
  3. These types of models have shown good performance with sparse data.

Installation Instructions

To install the most recent beta release of celda via devtools:

library(devtools)
install_github("compbiomed/celda@v0.3")

The most up-to-date (but potentially less stable) version of celda can similarly be installed with:

install_github("compbiomed/celda")

Examples and vignettes

Vignettes are available in the package.

An analysis example using celda with RNASeq via vignette('celda-analysis')

New Features and announcements

The v0.2 release of celda represents a useable implementation of the various celda clustering models. Please submit any usability issues or bugs to the issue tracker at https://github.com/compbiomed/celda

You can discuss celda, or ask the developers usage questions, in our Google Group.

Copy Link

Version

Version

0.0.0.9000

License

MIT

Maintainer

Joshua Campbell

Last Published

August 8th, 2018

Functions in celda (0.0.0.9000)

cluster_probability.celda_C

Calculates the conditional probability of each cell belong to each cluster given all other cluster assignments
getK.celda_G

getK for celda Gene clustering model
completeClusterHistory

Get the complete history of gene / cell / gene & cell cluster assignments generated during a celda run, dependent on the model provided.
finalClusterAssignment.celda_CG

finalClusterAssignment for the celda Cell and Gene clustering model
distinct_colors

Generate a distinct palette for coloring different clusters
plot_dr_grid

Create a scatterplot given two dimensions from a data dimensionality reduction tool (e.g tSNE)
celda_heatmap.celda_CG

celda_heatmap for celda Cell and Gene clustering model
plot_dr_gene

Create a scatterplot for each row of a normalized gene expression matrix where x and y axis are from a data dimensionality reduction tool.
getL

Get the L value used for each chain in a celda run.
celda_heatmap.celda_G

celda_heatmap for celda Gene clustering model
getK.celda_CG

getK for the celda Cell and Gene clustering model
completeLogLikelihood

Get the complete log likelihood for a given celda model.
getK.celda_C

getK for celda Cell clustering function
normalizeCounts

Normalize a counts matrix by a scalar factor
visualize_model_performance

Visualize various performance metrics as a function of K / L to aid parameter choice.
plot_dr_cluster

Create a scatterplot based on celda cluster labels.
visualize_model_performance.celda_C

visualize_model_performance for celda Cell clustering function
semi_pheatmap

A function to draw clustered heatmaps.
factorizeMatrix.celda_C

Generate factorized matrices showing each feature's influence on the celda_C model clustering
getL.celda_C

getL for celda Cell clustering function
getL.celda_CG

getL for the celda Cell and Gene clustering model
finalLogLikelihood

Get the log likelihood from the final iteration of Gibbs sampling for a given celda model.
render_celda_heatmap

Render a heatmap based on a matrix of counts where rows are genes and columns are cells.
finalClusterAssignment.celda_G

finalClusterAssignment for celda Gene clustering model
getK

Get the K value used for each chain in a celda run.
render_interactive_kl_plot

Render an interactive figure demonstrating the clustering performance of different K/L parameters
celda_heatmap.celda_C

celda_heatmap for celda Cell clustering function
factorizeMatrix.celda_CG

Generate factorized matrices showing each feature's influence on the celda_CG model clustering
simulateCells

Simulate count data from the celda generative models.
factorizeMatrix.celda_G

Generate factorized matrices showing each feature's influence on the celda_G model clustering
getL.celda_G

getL for celda Gene clustering model
render_iteration_likelihood_plot

Visualize log likelihood per iteration of Gibbs sampling
plot_dr_state

Create a scatterplot based off of a matrix containing the celda state probabilities per cell.
runParams

Get run parameters for a celda run.
getModel

Select the best model amongst a list of models generated in a celda run.
seed

Get the random seed for a given celda model.
sample.cells

sample.cells
recode_cluster_labels

Re-code cluster label by provided mapping scheme
simulateCells.celda_G

Simulate cells from the gene clustering generative model
simulateCells.celda_C

Simulate cells from the cell clustering generative model
topRank

Identify features with the highest influence on clustering
visualize_model_performance.celda_CG

Visualize the performance of celda_CG models grouped by L and K
simulateCells.celda_CG

Simulate cells from the cell/gene clustering generative model
visualize_model_performance.celda_G

visualize_model_performance for the celda Gene function
calculate_loglik_from_variables.celda_CG

Calculate log lileklihood for the celda Cell and Gene clustering model, given a set of cell / gene cluster assignments
celda_CG

celda Cell and Gene Clustering Model
available_models

available models
celda_C

celda Cell Clustering Model
calculate_marginal_likelihood

Calculate the marginal likelihood from a single celda model
calculate_perplexity

Calculate the perplexity from a single celda chain
calculate_loglik_from_variables.celda_G

Calculate the celda_G log likelihood for user-provided cluster assignments
clusterProbabilities

Get the probability of the cluster assignments generated during a celda run.
calculate_loglik_from_variables

Calculate a log-likelihood for a user-provided cluster assignment and count matrix, per the desired celda model.
calculate_loglik_from_variables.celda_C

Calculate the celda_C log likelihood for user-provided cluster assignments
celda

Run the celda Bayesian hierarchical model on a matrix of counts.
celda_G

celda Gene Clustering Model
chooseBestChain

Determine the best chain among a set of celda_* objects with otherwise uniform K/L choices.
clusterProbabilities.celda_C

clusterProbabilities for celda Cell clustering function
celda_heatmap

Render a stylable heatmap of count data based on celda clustering results.
compare_count_matrix

Check whether a count matrix was the one used in a given celda run
completeClusterHistory.celda_G

completeClusterHistory for celda Gene clustering model
completeClusterHistory.celda_C

completeClusterHistory for celda Cell clustering funciton
finalClusterAssignment.celda_C

finalClusterAssignment for celda Cell clustering funciton
completeClusterHistory.celda_CG

completeClusterHistory for the celda Cell and Gene clustering model
clusterProbabilities.celda_CG

clusterProbabilities for the celda Cell and Gene clustering model
clusterProbabilities.celda_G

clusterProbabilities for celda Gene clustering model
finalClusterAssignment

Get the final gene / cell / gene & cell cluster assignments generated during a celda run, dependent on the model provided.
celdaHeatmap

Render a stylable heatmap of count data based on celda clustering results.
calculateLoglikFromVariables.celda_C

Calculate the celda_C log likelihood for user-provided cluster assignments
calculateLoglikFromVariables.celda_CG

Calculate log lileklihood for the celda Cell and Gene clustering model, given a set of cell / gene cluster assignments
celdaHeatmap.celda_CG

celdaHeatmap for celda Cell and Gene clustering model
calculateLoglikFromVariables.celda_G

Calculate the celda_G log likelihood for user-provided cluster assignments
clusterProbability.celda_C

Calculates the conditional probability of each cell belong to each cluster given all other cluster assignments
clusterProbability.celda_CG

Calculates the conditional probability of each cell belong to each cluster given all other cluster assignments
calculatePerplexity

Calculate the perplexity from a single celda chain
clusterProbability.celda_G

Calculates the conditional probability of each cell belong to each cluster given all other cluster assignments
celdaHeatmap.celda_G

celdaHeatmap for celda Gene clustering model
compareCountMatrix

Check whether a count matrix was the one used in a given celda run
celdaHeatmap.celda_C

celdaHeatmap for celda Cell clustering function
calculateLoglikFromVariables

Calculate a log-likelihood for a user-provided cluster assignment and count matrix, per the desired celda model.
celdaLogLik

Plots the log likehood over iteration for each chain
plotDrGrid

Create a scatterplot given two dimensions from a data dimensionality reduction tool (e.g tSNE)
factorizeMatrix

Generate factorized matrices showing each feature's influence on cell / gene clustering
renderCeldaHeatmap

Render a heatmap based on a matrix of counts where rows are genes and columns are cells.
recodeClusterZ

Re-code cell cluster labels by provided mapping scheme
plotDrGene

Create a scatterplot for each row of a normalized gene expression matrix where x and y axis are from a data dimensionality reduction tool.
selectModels

Select a set of models from a celda_list objects based off of rows in its run.params attribute.
visualizeModelPerformance

Visualize various performance metrics as a function of K / L to aid parameter choice.
pbmc_data

pbmc_data
plotDrState

Create a scatterplot based off of a matrix containing the celda state probabilities per cell.
visualizeModelPerformance.celda_C

visualizeModelPerformance for celda Cell clustering function
plotDrCluster

Create a scatterplot based on celda cluster labels.
recodeClusterY

Re-code gene cluster labels by provided mapping scheme
visualizeModelPerformance.celda_CG

Visualize the performance of celda_CG models grouped by L and K
renderInteractiveKLPlot

Render an interactive figure demonstrating the clustering performance of different K/L parameters
renderIterationLikelihoodPlot

Visualize log likelihood per iteration of Gibbs sampling
visualizeModelPerformance.celda_G

visualizeModelPerformance for the celda Gene function
cCG.decomposeCounts

Takes raw counts matrix and converts it to a series of matrices needed for log likelihood calculation
cC.decomposeCounts

Takes raw counts matrix and converts it to a series of matrices needed for log likelihood calculation
cG.decomposeCounts

Takes raw counts matrix and converts it to a series of matrices needed for log likelihood calculation
absoluteProbabilityHeatmap

Renders a heatmap based on a population matrix from the factorized counts matrix.
GiniPlot

Create a bar chart based on Gini coefficients of transcriptional states
celdaTsne

Runs tSNE via Rtsne based on the CELDA model and specified cell states.
eigenMatMultInt

Fast matrix multiplication for double x int
clusterProbability

Get the probability of the cluster assignments generated during a celda run.
calculatePerplexityWithResampling

Calculate and visualize perplexity of all models in a celda_list, with resampling
pbmc_res

pbmc_res
calc.perplexity

calc.perplexity
calc.perplexity2

calc.perplexity2
pbmc_select

pbmc_select
gettingClusters

Assists in selection of K/L parameter for downstream analysis.
lookupGeneModule

Obtain the gene module of a gene of interest
diffExpBetweenCellStates

Gene expression markers for cell clusters using MAST
relativeProbabilityHeatmap

Renders a heatmap based on a population matrix from the factorized counts matrix. The relative probability of each transcriptional state in each cell subpopulation is visualized.
stateHeatmap

Transcriptional state heatmap
getBestModel

Select the best model from a celda_list, as determined by final log-likelihood.
validateRunParams

Check whether the given celda_list's run.params attribute ordering matches it's res.list models
visualizePerplexity

Visualize perplexity as a function of cell/gene clustering
cluster_probability.celda_CG

Calculates the conditional probability of each cell belong to each cluster given all other cluster assignments
cluster_probability.celda_G

Calculates the conditional probability of each cell belong to each cluster given all other cluster assignments
cmatp_data

cmatp_data