Learn R Programming

conos

Conos: Clustering On Network Of Samples

  • What is conos?

Conos is an R package to wire together large collections of single-cell RNA-seq datasets, which allows for both the identification of recurrent cell clusters and the propagation of information between datasets in multi-sample or atlas-scale collections. It focuses on the uniform mapping of homologous cell types across heterogeneous sample collections. For instance, users could investigate a collection of dozens of peripheral blood samples from cancer patients combined with dozens of controls, which perhaps includes samples of a related tissue such as lymph nodes.

  • How does it work?

Conos applies one of many error-prone methods to align each pair of samples in a collection, establishing weighted inter-sample cell-to-cell links. The resulting joint graph can then be analyzed to identify subpopulations across different samples. Cells of the same type will tend to map to each other across many such pairwise comparisons, forming cliques that can be recognized as clusters (graph communities).

Conos processing can be divided into three phases: * Phase 1: Filtering and normalization Each individual dataset in the sample panel is filtered and normalized using standard packages for single-dataset processing: either pagoda2 or Seurat. Specifically, Conos relies on these methods to perform cell filtering, library size normalization, identification of overdispersed genes and, in the case of pagoda2, variance normalization. (Conos is robust to variations in the normalization procedures, but it is recommended that all of the datasets be processed uniformly.) * Phase 2: Identify multiple plausible inter-sample mappings Conos performs pairwise comparisons of the datasets in the panel to establish an initial error-prone mapping between cells of different datasets. * Phase 3: Joint graph construction These inter-sample edges from Phase 2 are then combined with lower-weight intra-sample edges during the joint graph construction. The joint graph is then used for downstream analysis, including community detection and label propagation. For a comprehensive description of the algorithm, please refer to our publication.

  • What does it produce?

In essence, conos will take a large, potentially heterogeneous panel of samples and will produce clustering grouping similar cell subpopulations together in a way that will be robust to inter-sample variation:

  • What are the advantages over existing alignment methods?

Conos is robust to heterogeneity of samples within a collection, as well as noise. The ability to resolve finer subpopulation structure improves as the size of the panel increases.

Basics of using conos

Given a list of individual processed samples (pl), conos processing can be as simple as this:

# Construct Conos object, where pl is a list of pagoda2 objects 
con <- Conos$new(pl)

# Build joint graph
con$buildGraph()

# Find communities
con$findCommunities()

# Generate embedding
con$embedGraph()

# Plot joint graph
con$plotGraph()

# Plot panel with joint clustering results
con$plotPanel()

To see more documentation on the class Conos, run ?Conos.

Tutorials

Please see the following tutorials for detailed examples of how to use conos:

Conos walkthrough:

Adjustment of alignment strength with conos:

Integration with Scanpy:

Note that for integration with Scanpy, users need to save conos files to disk from an R session, and then load these files into Python.

Save conos for Scanpy:

Load conos files into Scanpy:

Integrating RNA-seq and ATAC-seq with conos:

Running RNA velocity on a Conos object

First of all, in order to obtain an RNA velocity plot from a Conos object you have to use the dropEst pipeline to align and annotate your single-cell RNA-seq measurements. You can see this tutorial and this shell script to see how it can be done. In this example we specifically assume that when running dropEst you have used the -V option to get estimates of unspliced/spliced counts from the dropEst directly. Secondly, you need the velocyto.R package for the actual velocity estimation and visualisation.

After running dropEst you should have 2 files for each of the samples:

  • sample.rds (matrix of counts)
  • sample.matrices.rds (3 matrices of exons, introns and spanning reads)

The .matrices.rds files are the velocity files. Load them into R in a list (same order as you give to conos). Load, preprocess and integrate with conos the count matrices (.rds) as you normally would. Before running the velocity, you must at least create an embedding and run the leiden clustering. Finally, you can estimate the velocity as follows:

### Assuming con is your Conos object and cms.list is the list of your velocity files ###

library(velocyto.R)

# Preprocess the velocity files to match the Conos object
vi <- velocityInfoConos(cms.list = cms.list, con = con, 
                        n.odgenes = 2e3, verbose = TRUE)

# Estimate RNA velocity
vel.info <- vi %$%
  gene.relative.velocity.estimates(emat, nmat, cell.dist = cell.dist, 
                                   deltaT = 1, kCells = 25, fit.quantile = 0.05, n.cores = 4)

# Visualise the velocity on your Conos embedding 
# Takes a very long time! 
# Assign to a variable to speed up subsequent recalculations
cc.velo <- show.velocity.on.embedding.cor(vi$emb, vel.info, n = 200, scale = 'sqrt', 
                                          cell.colors = ac(vi$cell.colors, alpha = 0.5), 
                                          cex = 0.8, grid.n = 50, cell.border.alpha = 0,
                                          arrow.scale = 3, arrow.lwd = 0.6, n.cores = 4, 
                                          xlab = "UMAP1", ylab = "UMAP2")

# Use cc=cc.velo$cc when running again (skips the most time consuming delta projections step)
show.velocity.on.embedding.cor(vi$emb, vel.info, cc = cc.velo$cc, n = 200, scale = 'sqrt', 
                               cell.colors = ac(vi$cell.colors, alpha = 0.5), 
                               cex = 0.8, arrow.scale = 15, show.grid.flow = TRUE, 
                               min.grid.cell.mass = 0.5, grid.n = 40, arrow.lwd = 2,
                               do.par = F, cell.border.alpha = 0.1, n.cores = 4,
                               xlab = "UMAP1", ylab = "UMAP2")

Installation

To install the stable version from CRAN, use:

install.packages('conos')

To install the latest version of conos, use:

install.packages('devtools')
devtools::install_github('kharchenkolab/conos')

System dependencies

The dependencies are inherited from pagoda2. Note that this package also has the dependency igraph, which requires various libraries to install correctly. Please see the installation instructions at that page for more details, along with the github README here.

Ubuntu dependencies

To install system dependencies using apt-get, use the following:

sudo apt-get update
sudo apt-get -y install libcurl4-openssl-dev libssl-dev libxml2-dev libgmp-dev libglpk-dev
Red Hat-based distributions dependencies

For Red Hat distributions using yum, use the following command:

sudo yum update
sudo yum install openssl-devel libcurl-devel libxml2-devel gmp-devel glpk-devel
Mac OS

Using the Mac OS package manager Homebrew, try the following command:

brew update
brew install openssl curl-openssl libxml2 glpk gmp

(You may need to run brew uninstall curl in order for brew install curl-openssl to be successful.)

As of version 1.3.1, conos should successfully install on Mac OS. However, if there are issues, please refer to the following wiki page for further instructions on installing conos with Mac OS: Installing conos for Mac OS

Running conos via Docker

If your system configuration is making it difficult to install conos natively, an alternative way to get conos running is through a docker container.

Note: On Mac OS X, Docker Machine has Memory and CPU limits. To control it, please check instructions either for CLI or for Docker Desktop.

Ready-to-run Docker image

The docker distribution has the latest version and also includes the pagoda2 package. To start a docker container, first install docker on your platform and then start the pagoda2 container with the following command in the shell:

docker run -p 8787:8787 -e PASSWORD=pass pkharchenkolab/conos:latest

The first time you run this command, it will download several large images so make sure that you have fast internet access setup. You can then point your browser to http://localhost:8787/ to get an Rstudio environment with pagoda2 and conos installed (please log in using credentials username=rstudio, password=pass). Explore the docker --mount option to allow access of the docker image to your local files.

Note: If you already downloaded the docker image and want to update it, please pull the latest image with:

docker pull pkharchenkolab/conos:latest

Building Docker image from the Dockerfile

If you want to build image by your own, download the Dockerfile (available in this repo under /docker) and run to following command to build it:

docker build -t conos .

This will create a "conos" docker image on your system (please be patient, as the build could take approximately 30-50 minutes to finish). You can then run it using the following command:

docker run -d -p 8787:8787 -e PASSWORD=pass --name conos -it conos

References

If you find this software useful for your research, please cite the corresponding paper:

Barkas N., Petukhov V., Nikolaeva D., Lozinsky Y., Demharter S., Khodosevich K., & Kharchenko P.V. 
Joint analysis of heterogeneous single-cell RNA-seq dataset collections. 
Nature Methods, (2019). doi:10.1038/s41592-019-0466-z

The R package can be cited as:

Viktor Petukhov, Nikolas Barkas, Peter Kharchenko, and Evan
Biederstedt (2021). conos: Clustering on Network of Samples. R
package version 1.5.2.

Copy Link

Version

Install

install.packages('conos')

Monthly Downloads

492

Version

1.5.2

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Evan Biederstedt

Last Published

February 26th, 2024

Functions in conos (1.5.2)

Conos

Conos R6 class
getClusterRelationshipConsistency

Evaluate consistency of cluster relationships Using the clustering we are generating per-sample dendrograms and we are examining their similarity between different samples More information about similarity measures <https://www.rdocumentation.org/packages/dendextend/versions/1.8.0/topics/cor_cophenetic> <https://www.rdocumentation.org/packages/dendextend/versions/1.8.0/topics/cor_bakers_gamma>
getBetweenCellTypeDE

Compare two cell types across the entire panel
findSubcommunities

Increase resolution for a specific set of clusters
getBetweenCellTypeCorrectedDE

Compare two cell types across the entire panel
getCountMatrix

Access count matrix from sample
getCellNames

Access cell names from sample
getClustering

Access clustering from sample
getClusteringGroups

Extract specified clustering from list of conos clusterings
filter.genes.by.cluster.expression

Filter genes by requiring minimum average expression within at least one of the provided cell clusters
getPca

Access PCA from sample
getOverdispersedGenes

Access overdispersed genes from sample
getEmbedding

Access embedding from sample
getGlobalClusterMarkers

Deprecated; Get markers for global clusters
getNeighborMatrix

Establish rough neighbor matching between samples given their projections in a common space
getOdGenesUniformly

Get top overdispersed genes across samples
plotClusterBoxPlotsByAppType

Generate boxplot per cluster of the proportion of cells in each celltype
plotComponentVariance

Plot fraction of variance explained by the successive reduced space components (PCA, CPCA)
getSampleNamePerCell

Retrieve sample names per cell
getGeneExpression

Access gene expression from sample
namedLevels

Get a vector with the levels of a factor named with their own name. Useful for lapply loops over factor levels
greedyModularityCut

Performs a greedy top-down selective cut to optmize modularity
getGenes

Access genes from sample
getRawCountMatrix

Access raw count matrix from sample
getPercentGlobalClusters

Evaluate how many clusters are global
multimulti.community

Constructrs a two-step clustering, first running multilevel.communities, and then walktrap.communities within each These are combined into an overall hierarchy
plotDEheatmap

Plot a heatmap of differential genes
quickPlainPCA

Use space of combined sample-specific PCAs as a space
namedNames

Get a vector of the names of an object named by the names themselves. This is useful with lapply when passing names of objects as it ensures that the output list is also named
plotSamples

Plot panel of specified embeddings, extracting them from pagoda2 objects
multitrap.community

Constructs a two-step clustering, first running multilevel.communities, and then walktrap.communities within each These are combined into an overall hierarchy
quickCPCA

Perform cpca on two samples
propagateLabelsDiffusion

Estimate labeling distribution for each vertex, based on provided labels using Random Walk
sgdBatches

Calculate the default number of batches for a given number of vertices and edges. The formula used is the one used by the 'largeVis' reference implementation. This is substantially less than the recommendation \(E * 10000\) in the original paper.
small_panel.preprocessed

Small pre-processed data from Pagoda2, two samples, each dimension (1000, 100)
plotEmbeddings

Plot panel of specified embeddings
getPerCellTypeDE

Do differential expression for each cell type in a conos object between the specified subsets of apps
p2app4conos

Utility function to generate a pagoda2 app from a conos object
saveConosForScanPy

Save Conos object on disk to read it from ScanPy
projectKNNs

Project a distance matrix into a lower-dimensional space.
saveDEasCSV

Save differential expression as table in *csv format
quickCCA

Perform CCA (using PMA package or otherwise) on two samples
plotClusterBarplots

Plots barplots per sample of composition of each pagoda2 application based on selected clustering
rawMatricesWithCommonGenes

Get raw matrices with common genes
saveDEasJSON

Save differential expression results as JSON
stableTreeClusters

Determine number of detectable clusters given a reference walktrap and a bunch of permuted walktraps
reexports

Objects exported from other packages
scanKModularity

Scan joint graph modularity for a range of k (or k.self) values Builds graph with different values of k (or k.self if scan.k.self=TRUE), evaluating modularity of the resulting multilevel clustering NOTE: will run evaluations in parallel using con$n.cores (temporarily setting con$n.cores to 1 in the process)
velocityInfoConos

RNA velocity analysis on samples integrated with conos Create a list of objects to pass into gene.relative.velocity.estimates function from the velocyto.R package
checkCountsWholeNumbers

Check that the count data contain only integer counts
bestClusterTreeThresholds

Find threshold of cluster detectability in trees of clusters
estimateWeightEntropyPerCell

Estimate entropy of edge weights per cell according to the specified factor. Can be used to visualize alignment quality according to this factor.
basicSeuratProc

Create and preprocess a Seurat object
bestClusterThresholds

Find threshold of cluster detectability
buildWijMatrix

Rescale the weights in an edge matrix to match a given perplexity.
convertToPagoda2

Convert Conos object to Pagoda2 object
edgeMat<-

Set edge matrix edgeMat with certain values on sample
armaCor

A slightly faster way of calculating column correlation matrix