Learn R Programming

⚠️There's a newer version (1.0.12) of this package.Take me there.

Pagoda2

Pagoda2: Rapid Processing and Interactive Analysis of Large Datasets

Pagoda2 is an R package for analyzing and interactively exploring large-scale single-cell RNA-seq datasets. The methods were optimized to rapidly process modern scRNAseq datasets, which are both large (approximately 1e6 cells or greater) and sparse. The package provides methods for quality control, filtering, clustering, visualization, differential expression, cross-cutting aspects/states, and geneset/pathway overdispersion analysis. The companion frontend application allows users to figure out which gene expression patterns give rise to different subpopulations within the data. The application allows users to inspect the gene expression patterns of subpopulations through annotated gene sets and pathways, including Gene Ontology (GO) categories. Users may also highlight certain clusters and perform differential expression from their browsers via the frontend application.

Note that pagoda2 is an R package developed for analyzing standalone scRNAseq datasets. For joint analysis of multiple datasets, please see the package Conos. (The package pagoda2 is primarily used to preprocess input datasets for Conos.)

Several methods within this package were developed based on the originals implemented within SCDE and PAGODA1.

Tutorials

Basic Walkthrough

PCA-based Basic Walkthrough

Web Demo of Application

10X PBMC Dataset

Installation

To install the latest version of pagoda2, use:

install.packages('devtools')
devtools::install_github('kharchenkolab/pagoda2', build_vignettes = TRUE)

The package pagoda2 depends on data in a data package (p2data) that is available through a drat repository on GitHub. To use the pagoda2 package, you will need to install p2data. There are two equally valid options to install this package:

A) Users could install p2data by adding the drat archive to the list of repositories your system will query when adding and updating R packages. Once you do this, you can install p2data with install.packages(), using the command:

library(drat)
addRepo("kharchenkolab")
install.packages("p2data")

The following command is also a valid approach:

install.packages('p2data', repos='https://kharchenkolab.github.io/drat/', type='source')

Please see the drat documentation for more comprehensive explanations and vignettes.

B) Another way to install the package p2data is to use devtools::install_github():

library(devtools)
install_github("kharchenkolab/p2data")

Installing Linux dependencies

Installation for Debian-based distributions (e.g. Ubuntu):

sudo apt-get update
sudo apt-get -y install libcurl4-openssl-dev libssl-dev

Installation for Red-Hat-based distributions (e.g. CentOS or Fedora)

yum install openssl-devel libcurl-devel

Installing with Mac OS

We recommend the Homebrew package manager to install require dependencies on Mac OS. Please run the following commands in the terminal:

brew update
brew install curl openssl wget

As of version 0.1.3, pagoda2 should sucessfully install on Mac OS. However, if there are issues, please refer to the following wiki page for further instructions on installing pagoda2 with Mac OS: Installing pagoda2 for Mac OS

Pagoda2 via Docker

If you are having trouble setting up pagoda2 on your system, an alternative approach to work with pagoda2 is via a Docker container. To use the Docker container, first install docker on your platform and then run the pagoda2 image with the following command in the shell:

docker run -p 8787:8787 pkharchenkolab/pagoda2:latest

The first time you run this command, it will pull/download several images---please make sure that you have reliable internet access. You can then point your browser to http://localhost:8787/ to access an Rstudio environment with pagoda2 installed (log in using credentials rstudio/pass). Explore the Docker --mount option to allow the Docker image to access your local files.

Citation

If you find pagoda2 useful for your publication, please cite:

Nikolas Barkas, Viktor Petukhov, Peter Kharchenko and Evan
Biederstedt (2020). pagoda2: Single Cell Analysis and Differential
Expression. R package version 1.0.0.
https://github.com/kharchenkolab/pagoda2

Copy Link

Version

Install

install.packages('pagoda2')

Monthly Downloads

611

Version

1.0.0

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Evan Biederstedt

Last Published

January 28th, 2021

Functions in pagoda2 (1.0.0)

basicP2web

Generate a 'pagoda2' web application from a 'Pagoda2' object
cldend2array

Translate cell cluster dendrogram to an array, one row per node with 1/0 cluster membership
calcMulticlassified

Returns a list vector with the number of cells that are present in more than one selections in the provided p2 selection object
cellsPerSelectionGroup

Get the number of cells in each selection group
armaCor

armaCor - matrix column correlations. Allows faster matrix correlations with armadillo. Similar to cor() call, will calculate correlation between matrix columns
basicP2proc

Perform basic 'pagoda2' processing, i.e. adjust variance, calculate pca reduction, make knn graph, identify clusters with multilevel, and generate largeVis and tSNE embeddings.
buildWijMatrix

Rescale the weights in an edge matrix to match a given perplexity. From 'largeVis', <https://github.com/elbamos/largeVis>
areColors

Quick utility to check if given character vector is colors Thanks to Stackoverflow: http://stackoverflow.com/questions/13289009/check-if-character-string-is-a-valid-color-representation
Mode

Return the mode of a vector
factorFromP2Selection

Returns a factor of cell membership from a p2 selection object the factor only includes cells present in the selection. If the selection contains multiclassified cells an error is raised
.onUnload

Correct unloading of the library
multi2dend

Translate multilevel segmentation into a dendrogram, with the lowest level of the dendrogram listing the cells
namedNames

Get a vector of the names of an object named by the names themselves. This is useful with lapply when passing names of objects as it ensures that the output list is also named.
extendedP2proc

Perform extended 'Pagoda2' processing. Generate organism specific GO environment and calculate pathway overdispersion.
generateClassificationAnnotation

Given a cell clustering (partitioning) and a set of user provided selections generate a cleaned up annotation of cluster groups that can be used for classification
Pagoda2

Pagoda2 R6 class
factorToP2selection

Converts a names factor to a p2 selection object if colors are provided it assigns those, otherwise uses a rainbow palette
collapse.aspect.clusters

Collapse aspect patterns into clusters
compareClusterings

Compare two different clusterings provided as factors by plotting a normalised heatmap
p2.generate.mouse.go.web

Generates mouse (Mus musculus) GO annotation for the web object
factorListToMetadata

Converts a list of factors into 'pagoda2' metadata optionally filtering down to the cells present in the provided 'pagoda2' app.
diffExprOnP2FromWebSelection

Perform differential expression on a p2 object given a set of web selections and two groups to compare
p2.generate.mouse.go

Generate a GO environment for mouse for overdispersion analysis for the the back end
p2.generate.human.go.web

Generates human GO annotation for the web object
get.control.geneset

Get a control geneset for cell scoring using the method described in Puram, Bernstein (Cell, 2018)
make.p2.app

Generate a Rook Server app from a 'Pagoda2' object. This generates a 'pagoda2' web object from a 'Pagoda2' object by automating steps that most users will want to run. This function is a wrapper about the 'pagoda2' web constructor. (Advanced users may wish to use that constructor directly.)
diffExprOnP2FromWebSelectionOneGroup

Perform differential expression on a p2 object given a set of web selections and one group to compare against everything else
getClusterLabelsFromSelection

Assign names to the clusters, given a clustering vector and a set of selections. This function will use a set of pagoda2 cell seletcion to identify the clusters in a a named factor. It is meant to be used to import user defined annotations that are defined as selections into a more formal categorization of cells that are defined by cluster. To help with this the function allows a percent of cells to have been classified in the selections into multiple groups, something which may be the result of the users making wrong selections. The percent of cells allows to be multiselected in any given group is defined by multiClassCutoff. Furthermore the method will assign each cluster to a selection only if the most popular cluster to the next most popular exceed the ambiguous.ratio in terms of cell numbers. If a cluster does not satisfy this condtiion it is not assigned.
getColorsFromP2Selection

Retrieves the colors of each selection from a p2 selection object as a names vector of strings
p2.make.pagoda1.app

Create 'PAGODA1' web application from a 'Pagoda2' object 'PAGODA1' found here, with 'SCDE': <https://www.bioconductor.org/packages/release/bioc/html/scde.html>
p2.metadata.from.factor

Generate a list metadata structure that can be passed to a 'pagoda2' web object constructor as additional metadata given a named factor
gene.vs.molecule.cell.filter

Filter cells based on gene/molecule dependency
getCellsInSelections

Returns all the cells that are in the designated selections. Given a pagoda2 selections object and the names of some selections in it returns the names of the cells that are in these selections removed any duplicates
get.de.geneset

Generate differential expression genesets for the web app given a cell grouping by calculating DE sets between each cell set and everything else
p2.generate.go.web.fromGOEnv

Generates GO annotation for the web object from the GO environment used for enrichment analysis
pagoda2WebApp_packCompressInt32Array

pagoda2WebApp_packCompressInt32Array
pagoda2WebApp_packCompressFloat64Array

pagoda2WebApp_packCompressFloat64Array
p2.toweb.hdea

Generate a 'pagoda2' web object from a 'Pagoda2' object using hierarchical differential expression
p2.generate.human.go

Generate a GO environment for human for overdispersion analysis for the the back end
pagoda2WebApp_readStaticFile

pagoda2WebApp_readStaticFile
p2.generate.go

Generate a GO environment for the organism specified
p2.generate.go.web

Generates GO annotation for the web object for any species
pagoda2WebApp_generateGeneKnnJSON

pagoda2WebApp_generateGeneKnnJSON
pagoda2WebApp_sparseMatList

pagoda2WebApp_sparseMatList
minMaxScale

Scale the designated values between the range of 0 and 1
pagoda.reduce.loading.redundancy

Collapse aspects driven by the same combinations of genes. (Aspects are some pattern across cells e.g. sequencing depth, or PC corresponding to an undesired process such as ribosomal pathway variation.) Examines PC loading vectors underlying the identified aspects and clusters of aspects based on a product of loading and score correlation (raised to corr.power). Clusters of aspects driven by the same genes are determined based on the parameter "distance.threshold".
pagoda.reduce.redundancy

Collapse aspects driven by similar patterns (i.e. separate the same sets of cells) Examines PC loading vectors underlying the identified aspects and clusters aspects based on score correlation. Clusters of aspects driven by the same patterns are determined based on the distance.threshold.
p2ViewPagodaApp

p2ViewPagodaApp R6 class
pagoda2WebApp_call

pagoda2WebApp_call
pagoda2WebApp_reducedDendrogramJSON

pagoda2WebApp_reducedDendrogramJSON
score.cells.puram

Puram, Bernstein (Cell, 2018) Score cells as described in Puram, Bernstein (Cell, 2018)
getIntExtNamesP2Selection

Get a mapping form internal to external names for the specified selection object
sgdBatches

Calculate the default number of batches for a given number of vertices and edges. The formula used is the one used by the 'largeVis' reference implementation. This is substantially less than the recommendation \(E * 10000\) in the original paper.
subsetSignatureToData

Subset a gene signature to the genes in the given matrix with optional warning if genes are missing
papply

Parallel, optionally verbose lapply. See ?parallel::mclapply for more info.
score.cells.nb0

Score cells by getting mean expression of genes in signatures
tp2c.view.pathways

View pathway or gene-weighted PCA 'Pagoda2' version of the function pagoda.show.pathways() Takes in a list of pathways (or a list of genes), runs weighted PCA, optionally showing the result.
score.cells.nb1

Score cells after standardising the expression of each gene removing outliers
p2.generate.dr.go

Generate a GO environment for human for overdispersion analysis for the the back end
hierDiffToGenesets

Converts the output of hierarchical differential expression aspects into genesets that can be loaded into a 'pagoda2' web app to retrive the genes that make the geneset interactively
pagoda2WebApp_arrayToJSON

pagoda2WebApp_arrayToJSON
p2.generate.dr.go.web

Generates zebrafish (Danio rerio) GO annotation for the web object
pagoda2WebApp_getCompressedEmbedding

pagoda2WebApp_getCompressedEmbedding
pagoda2WebApp_cellOrderJSON

pagoda2WebApp_cellOrderJSON
pagoda2WebApp-class

pagoda2WebApp class to create 'pagoda2' web applications via a Rook server
pagoda2WebApp_cellmetadataJSON

pagoda2WebApp_cellmetadataJSON
pagoda2WebApp_geneInformationJSON

pagoda2WebApp_geneInformationJSON
pagoda2WebApp_serializeToStaticFast

pagoda2WebApp_serializeToStaticFast
pagoda2WebApp_availableAspectsJSON

pagoda2WebApp_availableAspectsJSON
pathway.pc.correlation.distance

Calculate correlation distance between PC magnitudes given a number of target dimensions
pagoda2WebApp_generateDendrogramOfGroups

Generate a dendrogram of groups
show.app

Directly open the 'pagoda2' web application and view the 'p2web' application object from our R session
readPagoda2SelectionFile

Reads a 'pagoda2' web app exported cell selection file exported as a list of list objects that contain the name of the selection, the color (as a hex string) and the identifiers of the individual cells
plotMulticlassified

Plot multiclassified cells per selection as a percent barplot
removeSelectionOverlaps

Remove cells that are present in more than one selection from all the selections they are in
sn

Set names equal to values, a stats::setNames wrapper function
pagoda2WebApp_serverLog

pagoda2WebApp_serverLog
plotOneWithValues

Plot the embedding of a 'Pagoda2' object with the given values
pagoda2WebApp_generateEmbeddingStructure

pagoda2WebApp_generateEmbeddingStructure
projectKNNs

Project a distance matrix into a lower-dimensional space. (from elbamos/largeVis)
writePagoda2SelectionFile

Writes a pagoda2 selection object as a p2 selection file that be be loaded to the web interface
read.10x.matrices

Quick loading of 10X CellRanger count matrices
writeGenesAsPagoda2Selection

Writes a list of genes as a gene selection that can be loaded in the web interface
readPagoda2SelectionAsFactor

Read a pagoda2 cell selection file and return it as a factor while removing any mutliclassified cells
read10xMatrix

This function reads a matrix generated by the 10x processing pipeline from the specified directory and returns it. It aborts if one of the required files in the specified directory do not exist.
plotSelectionOverlaps

Get a dataframe and plot summarising overlaps between selection of a pagoda2 selection object ignore self overlaps
webP2proc

Generate a 'pagoda2' web object
validateSelectionsObject

Validates a pagoda2 selection object
winsorize.matrix

Sets the ncol(mat)*trim top outliers in each row to the next lowest value same for the lowest outliers
view.aspects

Internal function to visualize aspects of transcriptional heterogeneity as a heatmap.