Public methods
Method new()
Initialize Pagoda2 class
Usage
Pagoda2$new(
  x,
  modelType = "plain",
  n.cores = parallel::detectCores(logical = FALSE),
  verbose = TRUE,
  min.cells.per.gene = 0,
  trim = round(min.cells.per.gene/2),
  min.transcripts.per.cell = 10,
  lib.sizes = NULL,
  log.scale = TRUE,
  keep.genes = NULL
)
Arguments
- x
- input count matrix 
modelTypeModel used to normalize count matrices (default='plain'). Only supported values are 'raw', 'plain', and 'linearObs'.
n.coresnumeric Number of cores to use (default=1)
verboseboolean Whether to give verbose output (default=TRUE)
min.cells.per.geneinteger Minimum number of cells per gene, used to subset counts for coverage (default=0)
trimnumeric Parameter used for winsorizing count data (default=round(min.cells.per.gene/2)). If value>0, will winsorize counts in normalized space in the hopes of getting a more stable depth estimates. If value<=0, ignored.
min.transcripts.per.cellinteger Minimum number of transcripts per cells, used to subset counts for coverage (default=10)
lib.sizescharacter vector of library sizes (default=NULL)
log.scaleboolean If TRUE, scale counts by log() (default=TRUE)
keep.geneslist of genes to keep in count matrix after filtering out by coverage but before normalization (default=NULL)
Returns
new Pagoda2 object
Examples
\donttest{ 
## Load pre-generated a dataset of 50 bone marrow cells as matrix
cm <- readRDS(system.file("extdata", "sample_BM1_50.rds", package="pagoda2"))
## Perform QC, i.e. filter any cells that
counts <- gene.vs.molecule.cell.filter(cm, min.cell.size=500)
rownames(counts) <- make.unique(rownames(counts))
## Generate Pagoda2 object 
p2_object <- Pagoda2$new(counts, log.scale=TRUE, min.cells.per.gene=10, n.cores=1) 
}Method setCountMatrix()
Provide the initial count matrix, and estimate deviance residual matrix (correcting for depth and batch)
Usage
Pagoda2$setCountMatrix(
  countMatrix,
  depthScale = 1000,
  min.cells.per.gene = 0,
  trim = round(min.cells.per.gene/2),
  min.transcripts.per.cell = 10,
  lib.sizes = NULL,
  log.scale = FALSE,
  keep.genes = NULL,
  verbose = TRUE
)
Arguments
- countMatrix
- input count matrix 
depthScalenumeric Scaling factor for normalizing counts (defaul=1e3). If 'plain', counts are scaled by counts = counts/as.numeric(depth/depthScale).
min.cells.per.geneinteger Minimum number of cells per gene, used to subset counts for coverage (default=0)
trimnumeric Parameter used for winsorizing count data (default=round(min.cells.per.gene/2)). If value>0, will winsorize counts in normalized space in the hopes of getting a more stable depth estimates. If value<=0, ignored.
min.transcripts.per.cellinteger Minimum number of transcripts per cells, used to subset counts for coverage (default=10)
lib.sizescharacter vector of library sizes (default=NULL)
log.scaleboolean If TRUE, scale counts by log() (default=TRUE)
keep.geneslist of genes to keep in count matrix after filtering out by coverage but before normalization (default=NULL)
verboseboolean Whether to give verbose output (default=TRUE)
Returns
normalized count matrix (or if modelTye='raw', the unnormalized count matrix)
Method adjustVariance()
Adjust variance of the residual matrix, determine overdispersed sites
This is done to normalize the extent to which genes with (very) different expression magnitudes will contribute to the downstream anlaysis.
Usage
Pagoda2$adjustVariance(
  gam.k = 5,
  alpha = 0.05,
  plot = FALSE,
  use.raw.variance = FALSE,
  use.unadjusted.pvals = FALSE,
  do.par = TRUE,
  max.adjusted.variance = 1000,
  min.adjusted.variance = 0.001,
  cells = NULL,
  verbose = TRUE,
  min.gene.cells = 0,
  persist = is.null(cells),
  n.cores = self$n.cores
)
Arguments
- gam.k
- integer The k used for the generalized additive model 'v ~ s(m, k =gam.k)' (default=5). If gam.k<2, linear regression is used 'lm(v ~ m)'. 
alphanumeric The Type I error probability or the significance level (default=5e-2). This is the criterion used to measure statistical significance, i.e. if the p-value < alpha, then it is statistically significant.
plotboolean Whether to output the plot (default=FALSE)
use.raw.variance(default=FALSE). If modelType='raw', then this conditional will be used as TRUE.
use.unadjusted.pvalsboolean Whether to use Benjamini-Hochberg adjusted p-values (default=FALSE).
do.parboolean Whether to put multiple graphs into a signle plot with par() (default=TRUE)
max.adjusted.variancenumeric Maximum adjusted variance (default=1e3). The gene scale factor is defined as sqrt(pmax(min.adjusted.variance,pmin(max.adjusted.variance,df$qv))/exp(df$v))
min.adjusted.variancenumeric Minimum adjusted variance (default=1e-3). The gene scale factor is defined as sqrt(pmax(min.adjusted.variance,pmin(max.adjusted.variance,df$qv))/exp(df$v))
cellscharacter vector Subset of cells upon which to perform variance normalization with adjustVariance() (default=NULL)
verboseboolean Whether to give verbose output (default=TRUE)
min.gene.cellsinteger Minimum number of genes per cells (default=0). This parameter is used to filter counts.
persistboolean Whether to save results (default=TRUE, i.e. is.null(cells)).
n.coresnumeric Number of cores to use (default=1)
Returns
residual matrix with adjusted variance
Examples
\donttest{
## Load pre-generated a dataset of 3000 bone marrow cells as matrix
cm <- p2data::sample_BM1
## Perform QC, i.e. filter any cells that
counts <- gene.vs.molecule.cell.filter(cm, min.cell.size=500)
rownames(counts) <- make.unique(rownames(counts))
## Generate Pagoda2 object 
p2_object <- Pagoda2$new(counts,log.scale=TRUE, min.cells.per.gene=10, n.cores=1) 
## Normalize gene expression variance
p2_object$adjustVariance(plot=TRUE, gam.k=10)
}Method makeKnnGraph()
Create k-nearest neighbor graph
Usage
Pagoda2$makeKnnGraph(
  k = 30,
  nrand = 1000,
  type = "counts",
  weight.type = "1m",
  odgenes = NULL,
  n.cores = self$n.cores,
  distance = "cosine",
  center = TRUE,
  x = NULL,
  p = NULL,
  var.scale = (type == "counts"),
  verbose = TRUE
)
Arguments
- k
- integer Number of k clusters for k-NN (default=30) 
nrandnumeric Number of randomizations i.e. the gene sets (of the same size) to be evaluated in parallel with each gene set (default=1e3)
typestring Data type of the reduction (default='counts'). If type='counts', this will access the raw counts. Otherwise, 'type' must be name of the reductions.
weight.typestring 'cauchy', 'normal', 'constant', '1m' (default='1m')
odgenescharacter vector Overdispersed genes to retrieve (default=NULL)
n.coresnumeric Number of cores to use (default=1)
distancestring Distance metric used: 'cosine', 'L2', 'L1', 'cauchy', 'euclidean' (default='cosine')
centerboolean Whether to use centering when distance='cosine' (default=TRUE). The parameter is ignored otherwise.
xcounts or reduction to use (default=NULL). If NULL, uses counts. Otherwise, checks for the reduction in self$reductions[[type]]
p(default=NULL)
var.scaleboolean Apply scaling if using raw counts (default=TRUE). If type="counts", var.scale is TRUE by default.
verboseboolean Whether to give verbose output (default=TRUE)
Returns
kNN graph, stored in self$graphs
Examples
\donttest{
## Load pre-generated a dataset of 3000 bone marrow cells as matrix
cm <- p2data::sample_BM1
## Perform QC, i.e. filter any cells that
counts <- gene.vs.molecule.cell.filter(cm, min.cell.size=300)
rownames(counts) <- make.unique(rownames(counts))
## Generate Pagoda2 object   
p2_object <- Pagoda2$new(counts,log.scale=TRUE, min.cells.per.gene=10, n.cores=1) 
## Normalize gene expression variance
p2_object$adjustVariance(plot=TRUE, gam.k=10)
## Generate a kNN graph of cells that will allow us to identify clusters of cells
p2_object$makeKnnGraph(k=20, center=FALSE, distance='L2')
}Method getKnnClusters()
Calculate clusters based on the kNN graph
Usage
Pagoda2$getKnnClusters(
  type = "counts",
  method = igraph::multilevel.community,
  name = "community",
  n.cores = self$n.cores,
  g = NULL,
  min.cluster.size = 1,
  persist = TRUE,
  ...
)
Arguments
- type
- string Data type (default='counts'). Currently only 'counts' supported. 
methodMethod to use (default=igraph::multilevel.community). Accepted methods are either 'igraph::infomap.community' or 'igraph::multilevel.community'. 
If NULL, if the number of vertices of the graph is greater than or equal to 2000, 'igraph::multilevel.community' will be used. Otherwise, 'igraph::infomap.community' will be used.
namestring Name of the community structure calculated from 'method' (default='community')
n.coresnumeric Number of cores to use (default=1)
gInput graph (default=NULL). If NULL, access graph from self$graphs[[type]].
min.cluster.sizeMinimum size of clusters (default=1). This parameter is primarily used to remove very small clusters.
persistboolean Whether to save the clusters and community structure (default=TRUE)
...Additional parameters to pass to 'method'
Returns
the community structure calculated from 'method'
Examples
\donttest{
## Load pre-generated a dataset of 3000 bone marrow cells as matrix
cm <- p2data::sample_BM1
## Perform QC, i.e. filter any cells that
counts <- gene.vs.molecule.cell.filter(cm, min.cell.size=900)
rownames(counts) <- make.unique(rownames(counts))
## Generate Pagoda2 object 
p2_object <- Pagoda2$new(counts,log.scale=TRUE, min.cells.per.gene=10, n.cores=1) 
## Normalize gene expression variance
p2_object$adjustVariance(plot=TRUE, gam.k=10)
## Reduce the dataset dimensions by running PCA
p2_object$calculatePcaReduction(nPcs=50, n.odgenes=3e3)
## Generate a KNN graph of cells that will allow us to identify clusters of cells
p2_object$makeKnnGraph(k=20, center=FALSE, distance='L2')
## Call clusters based on KNN
p2_object$getKnnClusters(method=infomap.community, type='counts')
}Method geneKnnbyPCA()
Deprecated function. Use makeGeneKnnGraph() instead.
Usage
Pagoda2$geneKnnbyPCA()
Method getHierarchicalDiffExpressionAspects()
Take a given clustering and generate a hierarchical clustering
Usage
Pagoda2$getHierarchicalDiffExpressionAspects(
  type = "counts",
  groups = NULL,
  clusterName = NULL,
  method = "ward.D",
  dist = "pearson",
  persist = TRUE,
  z.threshold = 2,
  n.cores = self$n.cores,
  min.set.size = 5,
  verbose = TRUE
)
Arguments
- type
- string Data type of the reduction (default='counts'). If type='counts', this will access the raw counts. Otherwise, 'type' must be name of the reductions. 
groupsfactor named with cell names specifying the clusters of cells to be compared (one against all) (default=NULL). To compare two cell clusters against each other, simply pass a factor containing only two levels.
clusterNamestring Cluster name to access (default=NULL)
methodstring The agglomeration method to be used in stats::hcust(method=method) (default='ward.D'). Accepted values are: "ward.D", "ward.D2", "single", "complete", "average" (= UPGMA), "mcquitty" (= WPGMA), "median" (= WPGMC) or "centroid" (= UPGMC). For more information, see stats::hclust().
diststring 'pearson', 'spearman', 'euclidean', 'L2', 'JS' (default='pearson')
persistboolean Whether to save the clusters and community structure (default=TRUE)
z.thresholdnumeric Threshold of z-scores to filter, >=z.threshold are kept (default=2)
n.coresnumeric Number of cores to use (default=1)
min.set.sizeinteger Minimum threshold of sets to keep (default=5)
verboseboolean Whether to give verbose output (default=TRUE)
Returns
hierarchical clustering
Examples
\donttest{
## Load pre-generated a dataset of 3000 bone marrow cells as matrix
cm <- p2data::sample_BM1
## Perform QC, i.e. filter any cells that
counts <- gene.vs.molecule.cell.filter(cm, min.cell.size=400)
rownames(counts) <- make.unique(rownames(counts))
## Generate Pagoda2 object 
p2_object <- Pagoda2$new(counts,log.scale=TRUE, min.cells.per.gene=10, n.cores=1) 
## Normalize gene expression variance
p2_object$adjustVariance(plot=TRUE, gam.k=10)
## Reduce the dataset dimensions by running PCA
p2_object$calculatePcaReduction(nPcs=50, n.odgenes=3e3)
## Generate a KNN graph of cells that will allow us to identify clusters of cells
p2_object$makeKnnGraph(k=20, type='PCA', center=FALSE, distance='cosine')
## Call clusters based on KNN
p2_object$getKnnClusters(method=walktrap.community,type='PCA',name='walktrap')
## Generate embedding of the data
p2_object$getEmbedding(type='PCA', embeddingType = 'largeVis', M=30, perplexity=30, gamma=1/30)
## Perform differential expression
p2_object$getDifferentialGenes(type='PCA', verbose=TRUE, clusterType='walktrap')
## Perform differential expression
hdea <- p2_object$getHierarchicalDiffExpressionAspects(type='PCA', 
            clusterName='walktrap', z.threshold=3)
}Method makeGeneKnnGraph()
Calculates gene Knn network for gene similarity
Usage
Pagoda2$makeGeneKnnGraph(
  nPcs = 100,
  center = TRUE,
  fastpath = TRUE,
  maxit = 1000,
  k = 30,
  n.cores = self$n.cores,
  verbose = TRUE
)
Arguments
- nPcs
- integer Number of principal components (default=100). This is the parameter 'nv' in irlba::irlba(), the number of right singular vectors to estimate. 
centerboolean Whether to center the PCA (default=TRUE)
fastpathboolean Whether to try a (fast) C algorithm implementation if possible (default=TRUE). This parameter is equivalent to 'fastpath' in irlba::irlba().
maxitinteger Maximum number of iterations (default=1000). This parameter is equivalent to 'maxit' in irlba::irlba().
kinteger Number of k clusters for calculating k-NN on the resulting principal components (default=30).
n.coresnumeric Number of cores to use (default=1)
verboseboolean Whether to give verbose output (default=TRUE)
Returns
graph with gene similarity
Examples
\donttest{
## Load pre-generated a dataset of 3000 bone marrow cells as matrix
cm <- p2data::sample_BM1
## Perform QC, i.e. filter any cells that
counts <- gene.vs.molecule.cell.filter(cm, min.cell.size=500)
rownames(counts) <- make.unique(rownames(counts))
## Generate Pagoda2 object 
p2_object <- Pagoda2$new(counts,log.scale=TRUE, min.cells.per.gene=10, n.cores=1)
## Normalize gene expression variance 
p2_object$adjustVariance(plot=TRUE, gam.k=10)
## Reduce the dataset dimensions by running PCA
p2_object$calculatePcaReduction(nPcs=50, n.odgenes=3e3)
p2_object$makeGeneKnnGraph(nPcs=50, k=20, center=TRUE)
}Method getDensityClusters()
Calculate density-based clusters
Usage
Pagoda2$getDensityClusters(
  type = "counts",
  embeddingType = NULL,
  name = "density",
  eps = 0.5,
  v = 0.7,
  s = 1,
  verbose = TRUE,
  ...
)
Arguments
- type
- string Data type (default='counts'). Currently only 'counts' supported. 
embeddingTypeThe type of embedding used when calculating with `getEmbedding()` (default=NULL). Accepted values are: 'largeVis', 'tSNE', 'FR', 'UMAP', 'UMAP_graph'
namestring Name fo the clustering (default='density').
epsnumeric value of the eps parameter, fed into dbscan::dbscan(x=emb, eps=eps, ...)
vnumeric The <U+201C>value<U+201D> to be used to complete the HSV color descriptions (default=0.7). Equivalent to the 'v' parameter in grDevices::rainbow().
snumeric The <U+201C>saturation<U+201D> to be used to complete the HSV color descriptions (default=1). Equivalent to the 's' parameter in grDevices::rainbow().
verboseboolean Whether to give verbose output (default=TRUE)
...Additional parameters passed to dbscan::dbscan(emb, ...)
Returns
density-based clusters
Examples
 
\donttest{
## Load pre-generated a dataset of 3000 bone marrow cells as matrix 
cm <- p2data::sample_BM1
## Perform QC, i.e. filter any cells that
counts <- gene.vs.molecule.cell.filter(cm, min.cell.size=500)
rownames(counts) <- make.unique(rownames(counts))
## Generate Pagoda2 object 
p2_object <- Pagoda2$new(counts,log.scale=TRUE, min.cells.per.gene=10, n.cores=1) 
## Normalize gene expression variance 
p2_object$adjustVariance(plot=TRUE, gam.k=10)
## Reduce the dataset dimensions by running PCA
p2_object$calculatePcaReduction(nPcs=50, n.odgenes=3e3)
## Generate a KNN graph of cells that will allow us to identify clusters of cells
p2_object$makeKnnGraph(k=50, type='PCA', center=TRUE, distance='cosine')
## Generate embedding of the data
p2_object$getEmbedding(type='PCA', embeddingType = 'UMAP', M=20, perplexity=30, gamma=1/20)
p2_object$getDensityClusters(type='PCA')
}Method getDifferentialGenes()
Determine differentially expressed genes, comparing each group against all others using Wilcoxon rank sum test
Usage
Pagoda2$getDifferentialGenes(
  type = "counts",
  clusterType = NULL,
  groups = NULL,
  name = "customClustering",
  z.threshold = 3,
  upregulated.only = FALSE,
  verbose = FALSE,
  append.specificity.metrics = TRUE,
  append.auc = FALSE
)
Arguments
- type
- string Data type (default='counts'). Currently only 'counts' supported. 
clusterTypeOptional cluster type to use as a group-defining factor (default=NULL)
groupsfactor named with cell names specifying the clusters of cells to be compared (one against all) (default=NULL). To compare two cell clusters against each other, simply pass a factor containing only two levels.
namestring Slot to store the results in (default='customClustering')
z.thresholdnumeric Minimal absolute Z score (adjusted) to report (default=3)
upregulated.onlyboolean Whether to report only genes that are expressed significantly higher in each group (default=FALSE)
verboseboolean Whether to give verbose output (default=FALSE)
append.specificity.metricsboolean Whether to append specifity metrics (default=TRUE). Uses the function sccore::appendSpecificityMetricsToDE().
append.aucboolean If TRUE, append AUC values (default=FALSE). Parameter ignored if append.specificity.metrics is FALSE.
Returns
List with each element of the list corresponding to a cell group in the provided/used factor (i.e. factor levels) 
    Each element of a list is a data frame listing the differentially epxressed genes (row names), with the following columns: 
    Z - adjusted Z score, with positive values indicating higher expression in a given group compare to the rest
    M - log2 fold change
    highest- a boolean flag indicating whether the expression of a given gene in a given vcell group was on average higher than in every other cell group
    fe - fraction of cells in a given group having non-zero expression level of a given gene
Examples
\donttest{
## Load pre-generated a dataset of 3000 bone marrow cells as matrix
cm <- p2data::sample_BM1
## Perform QC, i.e. filter any cells that
counts <- gene.vs.molecule.cell.filter(cm, min.cell.size=500)
rownames(counts) <- make.unique(rownames(counts))
## Generate Pagoda2 object 
p2_object <- Pagoda2$new(counts,log.scale=TRUE, min.cells.per.gene=10, n.cores=1) 
## Normalize gene expression variance
p2_object$adjustVariance(plot=TRUE, gam.k=10)
## Reduce the dataset dimensions by running PCA
p2_object$calculatePcaReduction(nPcs=50, n.odgenes=3e3)
## Generate a KNN graph of cells that will allow us to identify clusters of cells
p2_object$makeKnnGraph(k=50, type='PCA', center=TRUE, distance='cosine')
## Call clusters based on KNN
p2_object$getKnnClusters(method=multilevel.community,type='PCA',name='multilevel')
## Generate embedding of the data
p2_object$getEmbedding(type='PCA', embeddingType = 'UMAP', M=20, perplexity=30, gamma=1/20)
## Perform differential expression
p2_object$getDifferentialGenes(type='PCA',verbose=TRUE,clusterType='multilevel')
}Method plotDiffGeneHeatmap()
Plot heatmap of DE results
Usage
Pagoda2$plotDiffGeneHeatmap(
  type = "counts",
  clusterType = NULL,
  groups = NULL,
  n.genes = 100,
  z.score = 2,
  gradient.range.quantile = 0.95,
  inner.clustering = FALSE,
  gradientPalette = NULL,
  v = 0.8,
  s = 1,
  box = TRUE,
  drawGroupNames = FALSE,
  ...
)
Arguments
- type
- string Data type (default='counts'). Currently only 'counts' supported. 
clusterTypeOptional cluster type to use as a group-defining factor (default=NULL)
groupsfactor named with cell names specifying the clusters of cells to be compared (one against all) (default=NULL). To compare two cell clusters against each other, simply pass a factor containing only two levels.
n.genesinteger Number of genes to plot (default=100)
z.scorenumeric Threshold of z-scores to filter (default=2). Only greater than or equal to this value are kept.
gradient.range.quantilenumeric Trimming quantile (default=0.95)
inner.clusteringboolean Whether to cluster cells within each cluster (default=FALSE)
gradientPalettepalette of colors to use (default=NULL). If NULL, uses 'colorRampPalette(c('gray90','red'), space = "Lab")(1024)'
vnumeric The <U+201C>value<U+201D> to be used to complete the HSV color descriptions (default=0.7). Equivalent to the 'v' parameter in grDevices::rainbow().
snumeric The <U+201C>saturation<U+201D> to be used to complete the HSV color descriptions (default=1). Equivalent to the 's' parameter in grDevices::rainbow().
boxboolean Whether to draw a box around the current plot in the given color and linetype (default=TRUE)
drawGroupNamesboolean Whether to draw group names (default=FALSE)
...Additional parameters passed to internal function used for heatmap plotting, my.heatmap2()
Returns
heatmap of DE results
Method getRefinedLibSizes()
Recalculate library sizes using robust regression within clusters
Usage
Pagoda2$getRefinedLibSizes(
  clusterType = NULL,
  groups = NULL,
  type = "counts",
  n.cores = self$n.cores
)
Arguments
- clusterType
- Optional cluster type to use as a group-defining factor (default=NULL) 
groupsfactor named with cell names specifying the clusters of cells to be compared (one against all) (default=NULL). To compare two cell clusters against each other, simply pass a factor containing only two levels.
typestring Data type (default='counts'). Currently only 'counts' supported.
n.coresnumeric Number of cores to use (default=1)
Returns
recalculated library sizes
Examples
\donttest{
## Load pre-generated a dataset of 3000 bone marrow cells as matrix
cm <- p2data::sample_BM1
## Perform QC, i.e. filter any cells that
counts <- gene.vs.molecule.cell.filter(cm, min.cell.size=500)
rownames(counts) <- make.unique(rownames(counts))
## Generate Pagoda2 object 
p2_object <- Pagoda2$new(counts,log.scale=TRUE, min.cells.per.gene=10, n.cores=1) 
## Normalize gene expression variance
p2_object$adjustVariance(plot=TRUE, gam.k=10)
## Reduce the dataset dimensions by running PCA
p2_object$calculatePcaReduction(nPcs=50, n.odgenes=3e3)
## Generate a kNN graph of cells that will allow us to identify clusters of cells
p2_object$makeKnnGraph(k=50, type='PCA', center=TRUE, distance='cosine')
## Call clusters based on kNN
p2_object$getKnnClusters(method=infomap.community, type='PCA')
p2_object$getRefinedLibSizes(type='PCA')
lib.sizes <- p2_object$getRefinedLibSizes(type="PCA")
}Method plotGeneHeatmap()
Plot heatmap for a given set of genes
Usage
Pagoda2$plotGeneHeatmap(
  genes,
  type = "counts",
  clusterType = NULL,
  groups = NULL,
  gradient.range.quantile = 0.95,
  cluster.genes = FALSE,
  inner.clustering = FALSE,
  gradientPalette = NULL,
  v = 0.8,
  s = 1,
  box = TRUE,
  drawGroupNames = FALSE,
  useRaster = TRUE,
  smooth.span = max(1, round(nrow(self$counts)/1024)),
  ...
)
Arguments
- genes
- character vector Gene names 
typestring Data type (default='counts'). Currently only 'counts' supported.
clusterTypeOptional cluster type to use as a group-defining factor (default=NULL)
groupsfactor named with cell names specifying the clusters of cells to be compared (one against all) (default=NULL). To compare two cell clusters against each other, simply pass a factor containing only two levels.
gradient.range.quantilenumeric Trimming quantile (default=0.95)
cluster.genesboolean Whether to cluster genes within each cluster using stats::hclust() (default=FALSE)
inner.clusteringboolean Whether to cluster cells within each cluster (default=FALSE)
gradientPalettepalette of colors to use (default=NULL). If NULL, uses 'colorRampPalette(c('gray90','red'), space = "Lab")(1024)'
vnumeric The <U+201C>value<U+201D> to be used to complete the HSV color descriptions (default=0.7). Equivalent to the 'v' parameter in grDevices::rainbow().
snumeric The <U+201C>saturation<U+201D> to be used to complete the HSV color descriptions (default=1). Equivalent to the 's' parameter in grDevices::rainbow().
boxboolean Whether to draw a box around the current plot in the given color and linetype (default=TRUE)
drawGroupNamesboolean Whether to draw group names (default=FALSE)
useRasterboolean If TRUE a bitmap raster is used to plot the image instead of polygons (default=TRUE). The grid must be regular in that case, otherwise an error is raised. For more information, see graphics::image().
smooth.span(default=max(1,round(nrow(self$counts)/1024)))
...Additional parameters passed to internal function used for heatmap plotting, my.heatmap2()
Returns
plot of gene heatmap
Examples
\donttest{
## Load pre-generated a dataset of 3000 bone marrow cells as matrix
cm <- p2data::sample_BM1
## Perform QC, i.e. filter any cells that
counts <- gene.vs.molecule.cell.filter(cm, min.cell.size=500)
rownames(counts) <- make.unique(rownames(counts))
## Generate Pagoda2 object 
p2_object <- Pagoda2$new(counts,log.scale=TRUE, min.cells.per.gene=10, n.cores=1) 
## Normalize gene expression variance 
p2_object$adjustVariance(plot=TRUE, gam.k=10)
## Reduce the dataset dimensions by running PCA
p2_object$calculatePcaReduction(nPcs=50, n.odgenes=3e3)
## Generate a kNN graph of cells that will allow us to identify clusters of cells
p2_object$makeKnnGraph(k=50, type='PCA', center=TRUE, distance='cosine')
## Call clusters based on kNN
p2_object$getKnnClusters(method=multilevel.community,type='PCA',name='multilevel')
## Generate embedding of the data
p2_object$getEmbedding(type='PCA', embeddingType = 'UMAP', M=20, perplexity=30, gamma=1/20)
## Perform differential expression
p2_object$getDifferentialGenes(type='PCA',verbose=TRUE,clusterType='multilevel')
de <- p2_object$diffgenes$PCA[[1]][['2']]
p2_object$plotGeneHeatmap(genes=rownames(de)[1:15], 
    groups=p2_object$clusters$PCA[[1]], cluster.genes=TRUE)
}Method plotEmbedding()
Show embedding
Usage
Pagoda2$plotEmbedding(
  type = NULL,
  embeddingType = NULL,
  clusterType = NULL,
  groups = NULL,
  colors = NULL,
  gene = NULL,
  plot.theme = ggplot2::theme_bw(),
  ...
)
Arguments
- type
- string Either 'counts' or the name of a stored embedding, names(self$embeddings) (default=NULL) 
embeddingTypestring Embedding type (default=NULL). If NULL, takes the most recently generated embedding.
clusterTypeName of cluster to access (default=NULL). If NULL, takes the most recently generated clustering. Parameter ignored if groups is not NULL.
groupsfactor named with cell names specifying the clusters of cells (default=NULL)
colorscharacter vector List of gene names (default=NULL)
gene(default=NULL)
plot.theme(default=ggplot2::theme_bw())
...Additional parameters passed to sccore::embeddingPlot()
Returns
plot of the embedding
Examples
\donttest{
## Load pre-generated a dataset of 3000 bone marrow cells as matrix
cm <- p2data::sample_BM1
## Perform QC, i.e. filter any cells that
counts <- gene.vs.molecule.cell.filter(cm, min.cell.size=500)
rownames(counts) <- make.unique(rownames(counts))
## Generate Pagoda2 object 
p2_object <- Pagoda2$new(counts,log.scale=TRUE, min.cells.per.gene=10, n.cores=1) 
## Normalize gene expression variance 
p2_object$adjustVariance(plot=TRUE, gam.k=10)
## Reduce the dataset dimensions by running PCA
p2_object$calculatePcaReduction(nPcs=50, n.odgenes=3e3)
## Generate a kNN graph of cells that will allow us to identify clusters of cells
p2_object$makeKnnGraph(k=50, type='PCA', center=TRUE, distance='cosine')
## Call clusters based on kNN
p2_object$getKnnClusters(method=multilevel.community,type='PCA',name='multilevel')
## Generate embedding of the data
p2_object$getEmbedding(type='PCA', embeddingType = 'UMAP', M=20, perplexity=30, gamma=1/20)
library(ggplot2)
p2_object$plotEmbedding(type='PCA', show.legend=FALSE, mark.groups=TRUE, min.cluster.size=50,
     shuffle.colors=FALSE, font.size=1, alpha=0.1, title='clusters (UMAP)', 
     plot.theme=theme(plot.title = element_text(hjust = 0.5)))
}Method getOdGenes()
Get overdispersed genes
Usage
Pagoda2$getOdGenes(
  n.odgenes = NULL,
  alpha = 0.05,
  use.unadjusted.pvals = FALSE
)
Arguments
- n.odgenes
- integer Number of overdispersed genes to retrieve (default=NULL). If NULL, will return all. 
alphanumeric The Type I error probability or the significance level (default=5e-2). This is the criterion used to measure statistical significance, i.e. if the p-value < alpha, then it is statistically significant.
use.unadjusted.pvalsboolean Whether to use Benjamini-Hochberg adjusted p-values (default=FALSE).
Returns
vector of overdispersed genes
Examples
\donttest{
## Load pre-generated a dataset of 3000 bone marrow cells as matrix
cm <- p2data::sample_BM1
## Perform QC, i.e. filter any cells that
counts <- gene.vs.molecule.cell.filter(cm, min.cell.size=500)
rownames(counts) <- make.unique(rownames(counts))
## Generate Pagoda2 object 
p2_object <- Pagoda2$new(counts,log.scale=TRUE, min.cells.per.gene=10, n.cores=1) 
## Normalize gene expression variance 
p2_object$adjustVariance(plot=TRUE, gam.k=10)
## Reduce the dataset dimensions by running PCA
p2_object$calculatePcaReduction(nPcs=50, n.odgenes=3e3)
## Generate a kNN graph of cells that will allow us to identify clusters of cells
p2_object$makeKnnGraph(k=50, type='PCA', center=TRUE, distance='cosine')
## Call clusters based on kNN
p2_object$getKnnClusters(method=infomap.community, type='PCA')
## Generate embedding of the data
p2_object$getEmbedding(type='PCA', M=20, perplexity=30, gamma=1/20)
## Perform differential expression
p2_object$getDifferentialGenes(type='PCA',verbose=TRUE)
odGenes <- p2_object$getOdGenes(use.unadjusted.pvals=FALSE)
}Method getNormalizedExpressionMatrix()
Return variance-normalized matrix for specified genes or a number of OD genes
Usage
Pagoda2$getNormalizedExpressionMatrix(genes = NULL, n.odgenes = NULL)
Arguments
- genes
- vector of gene names to explicitly return (default=NULL) 
n.odgenesinteger Number of overdispersed genes to retrieve (default=NULL). If NULL, will return all.
Returns
cell by gene matrix
Examples
\donttest{
## Load pre-generated a dataset of 3000 bone marrow cells as matrix
cm <- p2data::sample_BM1
## Perform QC, i.e. filter any cells that
counts <- gene.vs.molecule.cell.filter(cm, min.cell.size=500)
rownames(counts) <- make.unique(rownames(counts))
## Generate Pagoda2 object 
p2_object <- Pagoda2$new(counts,log.scale=TRUE, min.cells.per.gene=10, n.cores=1) 
## Normalize gene expression variance
p2_object$adjustVariance(plot=TRUE, gam.k=10)
p2_object$getNormalizedExpressionMatrix()
}Method calculatePcaReduction()
Calculate PCA reduction of the data
Usage
Pagoda2$calculatePcaReduction(
  nPcs = 20,
  type = "counts",
  name = "PCA",
  use.odgenes = TRUE,
  n.odgenes = NULL,
  odgenes = NULL,
  center = TRUE,
  cells = NULL,
  fastpath = TRUE,
  maxit = 100,
  verbose = TRUE,
  var.scale = (type == "counts"),
  ...
)
Arguments
- nPcs
- numeric Number of principal components (PCs) (default=20) 
typestring Dataset view to reduce (counts by default, but can specify a name of an existing reduction) (default='counts')
namestring Name for the PCA reduction to be created (default='PCA')
use.odgenesboolean Whether pre-calculated set of overdispersed genes should be used (default=TRUE)
n.odgenesinteger Number of overdispersed genes to retrieve (default=NULL). If NULL, will return all.
odgenesExplicitly specify a set of overdispersed genes to use for the reduction (default=NULL)
centerboolean Whether data should be centered prior to PCA (default=TRUE)
cellsoptional subset of cells on which PCA should be run (default=NULL)
fastpathboolean Use C implementation for speedup (default=TRUE)
maxitnumeric Maximum number of iterations (default=100). For more information, see 'maxit' parameter in irlba::irlba().
verboseboolean Whether to give verbose output (default=TRUE)
var.scaleboolean Apply scaling if using raw counts (default=TRUE). If type="counts", var.scale is TRUE by default.
...additional arguments forwarded to irlba::irlba
Returns
Invisible PCA result (the reduction itself is saved in self$reductions[[name]])"
Examples
\donttest{
## Load pre-generated a dataset of 3000 bone marrow cells as matrix
cm <- p2data::sample_BM1
## Perform QC, i.e. filter any cells that
counts <- gene.vs.molecule.cell.filter(cm, min.cell.size=600)
rownames(counts) <- make.unique(rownames(counts))
## Generate Pagoda2 object 
p2_object <- Pagoda2$new(counts, log.scale=FALSE, min.cells.per.gene=30, n.cores=1)
## Normalize gene expression variance 
p2_object$adjustVariance(plot=TRUE, gam.k=15)
## Reduce the dataset dimensions by running PCA
p2_object$calculatePcaReduction(nPcs=50, n.odgenes=2e3)
}Method expandOdGenes()
Reset overdispersed genes 'odgenes' to be a superset of the standard odgene selection (guided by n.odgenes or alpha), 
    and a set of recursively determined odgenes based on a given group (or a cluster info)
Usage
Pagoda2$expandOdGenes(
  type = "counts",
  clusterType = NULL,
  groups = NULL,
  min.group.size = 30,
  od.alpha = 0.1,
  use.odgenes = FALSE,
  n.odgenes = NULL,
  odgenes = NULL,
  n.odgene.multiplier = 1,
  gam.k = 10,
  verbose = FALSE,
  n.cores = self$n.cores,
  min.odgenes = 10,
  max.odgenes = Inf,
  recursive = TRUE
)
Arguments
- type
- string Data type (default='counts'). Currently only 'counts' supported. 
clusterTypeOptional cluster type to use as a group-defining factor (default=NULL)
groupsfactor named with cell names specifying the clusters of cells to be compared (one against all) (default=NULL). To compare two cell clusters against each other, simply pass a factor containing only two levels.
min.group.sizeinteger Number of minimum cells for filtering out group size (default=30)
od.alphanumeric The Type I error probability or the significance level for calculating overdispersed genes (default=1e-1). This is the criterion used to measure statistical significance, i.e. if the p-value < alpha, then it is statistically significant.
use.odgenesboolean Whether pre-calculated set of overdispersed genes should be used (default=FALSE)
n.odgenesinteger Number of overdispersed genes to retrieve (default=NULL). If NULL, will return all.
odgenesExplicitly specify a set of overdispersed genes to use for the reduction (default=NULL) #' @param odgenes (default=NULL)
n.odgene.multipliernumeric (default=1)
gam.kinteger The k used for the generalized additive model 'v ~ s(m, k =gam.k)' (default=10). If gam.k<2, linear regression is used 'lm(v ~ m)'.
verboseboolean Whether to give verbose output (default=TRUE)
n.coresnumeric Number of cores to use (default=1)
min.odgenesinteger Minimum number of overdispersed genes to use (default=10)
max.odgenesinteger Maximum number of overdispersed genes to use (default=Inf)
recursiveboolean Whether to determine groups for which variance normalization will be rerun (default=TRUE)
Returns
List of overdispersed genes
Examples
\donttest{
## Load pre-generated a dataset of 3000 bone marrow cells as matrix
cm <- p2data::sample_BM1
## Perform QC, i.e. filter any cells that
counts <- gene.vs.molecule.cell.filter(cm, min.cell.size=500)
rownames(counts) <- make.unique(rownames(counts))
## Generate Pagoda2 object 
p2_object <- Pagoda2$new(counts,log.scale=TRUE, min.cells.per.gene=10, n.cores=1) 
## Normalize gene expression variance
p2_object$adjustVariance(plot=TRUE, gam.k=10)
## Reduce the dataset dimensions by running PCA
p2_object$calculatePcaReduction(nPcs=50, n.odgenes=3e3)
## Generate a kNN graph of cells that will allow us to identify clusters of cells
p2_object$makeKnnGraph(k=50, type='PCA', center=TRUE, distance='cosine')
## Call clusters based on kNN
p2_object$getKnnClusters(method=infomap.community, type='PCA')
## Generate embedding of the data
p2_object$getEmbedding(type='PCA', M=20, perplexity=30, gamma=1/20)
## Perform differential expression
p2_object$getDifferentialGenes(type='PCA',verbose=TRUE)
p2_object$expandOdGenes(type='PCA')
}Method localPcaKnn()
local PCA implementation
Usage
Pagoda2$localPcaKnn(
  nPcs = 5,
  type = "counts",
  clusterType = NULL,
  groups = NULL,
  k = 30,
  b = 1,
  a = 1,
  min.group.size = 30,
  name = "localPCA",
  od.alpha = 0.1,
  n.odgenes = NULL,
  gam.k = 10,
  verbose = FALSE,
  n.cores = self$n.cores,
  min.odgenes = 5,
  take.top.odgenes = FALSE,
  recursive = TRUE,
  euclidean = FALSE,
  perplexity = k,
  return.pca = FALSE,
  skip.pca = FALSE
)
Arguments
- nPcs
- integer Number of principal components (default=5) 
typestring Data type (default='counts'). Currently only 'counts' supported.
clusterTypeOptional cluster type to use as a group-defining factor (default=NULL)
groupsfactor named with cell names specifying the clusters of cells to be compared (one against all) (default=NULL). To compare two cell clusters against each other, simply pass a factor containing only two levels.
kinteger Number of components for kNN graph (default=30)
bnumeric Constant within exp(-b*(ncid/cldsd)^2), used for calculating cell relevance per cluster (default=1)
anumeric Constant within "(1-exp(-a*(dsq)/(p$pcs$trsd^2)))*(pk /outerproduct pk)" (default=1)
min.group.sizeinteger Number of minimum cells for filtering out group size (default=30)
namestring Title (default='localPCA')
od.alphanumeric Significance level for calculating overdispersed genes (default=1e-1). P-values will be filtered by <log(od.alpha).
n.odgenesinteger Number of overdispersed genes to retrieve (default=NULL). If NULL, will return all.
gam.kinteger The k used for the generalized additive model 'v ~ s(m, k =gam.k)' (default=10). If gam.k<2, linear regression is used 'lm(v ~ m)'.
verboseboolean Whether to give verbose output (default=TRUE)
n.coresnumeric Number of cores to use (default=1)
min.odgenesinteger Minimum number of overdispersed genes to use (default=5)
take.top.odgenesboolean Take top overdispersed genes in decreasing order (default=FALSE)
recursiveboolean Whether to recursively determine groups for which variance normalization will be rerun (default=FALSE)
euclideanboolean Whether to applied euclidean-based distance similarity during variance normalization (default=FALSE)
perplexityinteger Perplexity parameter within Rtsne::Rtsne() (default=k). Please see Rtsne for more details.
return.pcaboolean Whether to return the PCs (default=FALSE)
skip.pcaboolean If TRUE and return.pca=TRUE, will return a list of scale factors, cells, and overdispersed genes, i.e. list(sf=sf, cells=cells, odgenes=odgenes) (default=FALSE). Otherwise, ignored.
Returns
localPcaKnn return here
Method testPathwayOverdispersion()
Test pathway overdispersion
Note: this is a compressed version of the PAGODA1 approach in SCDE <https://hms-dbmi.github.io/scde/>
Usage
Pagoda2$testPathwayOverdispersion(
  setenv,
  type = "counts",
  max.pathway.size = 1000,
  min.pathway.size = 10,
  n.randomizations = 5,
  verbose = FALSE,
  n.cores = self$n.cores,
  score.alpha = 0.05,
  plot = FALSE,
  cells = NULL,
  adjusted.pvalues = TRUE,
  z.score = qnorm(0.05/2, lower.tail = FALSE),
  use.oe.scale = FALSE,
  return.table = FALSE,
  name = "pathwayPCA",
  correlation.distance.threshold = 0.2,
  loading.distance.threshold = 0.01,
  top.aspects = Inf,
  recalculate.pca = FALSE,
  save.pca = TRUE
)
Arguments
- setenv
- Specific environment for pathway analysis 
typestring Data type (default='counts'). Currently only 'counts' supported.
max.pathway.sizeinteger Maximum number of observed genes in a valid gene set (default=1e3)
min.pathway.sizeinteger Minimum number of observed genes that should be contained in a valid gene set (default=10)
n.randomizationsnumeric Number of random gene sets (of the same size) to be evaluated in parallel with each gene set (default=5). (This can be kept at 5 or 10, but should be increased to 50-100 if the significance of pathway overdispersion will be determined relative to random gene set models.)
verboseboolean Whether to give verbose output (default=TRUE)
n.coresnumeric Number of cores to use (default=1)
score.alphanumeric Significance level of the confidence interval for determining upper/lower bounds (default=0.05)
plotboolean Whether to output the plot (default=FALSE)
cellscharacter vector Specific cells to investigate (default=NULL)
adjusted.pvaluesboolean Whether to use adjusted p-values (default=TRUE)
z.scorenumeric Z-score to be used as a cutoff for statistically significant patterns (default=qnorm(0.05/2, lower.tail = FALSE))
use.oe.scaleboolean Whether the variance of the returned aspect patterns should be normalized using observed/expected value instead of the default chi-squared derived variance corresponding to overdispersion Z-score (default=FALSE)
return.tableboolean Whether to return a text table with results (default=FALSE)
namestring Title (default='pathwayPCA')
correlation.distance.thresholdnumeric Similarity threshold for grouping interdependent aspects in pagoda.reduce.redundancy() (default=0.2)
loading.distance.thresholdnumeric Similarity threshold for grouping interdependent aspects in pagoda.reduce.loading.redundancy() (default=0.2)
top.aspectsRestrict output to the top N aspects of heterogeneity (default=Inf)
recalculate.pcaboolean Whether to recalculate PCA (default=FALSE)
save.pcaboolean Whether to save the PCA results (default=TRUE). If TRUE, caches them in self$misc[['pwpca']].
Returns
pathway output
Method getEmbedding()
Return embedding
Usage
Pagoda2$getEmbedding(
  type = "counts",
  embeddingType = "largeVis",
  name = NULL,
  dims = 2,
  M = 1,
  gamma = 1/M,
  perplexity = 50,
  verbose = TRUE,
  sgd_batches = NULL,
  diffusion.steps = 0,
  diffusion.power = 0.5,
  distance = "pearson",
  n.cores = self$n.cores,
  n.sgd.cores = n.cores,
  ...
)
Arguments
- type
- string Data type (default='counts'). Currently only 'counts' supported. 
embeddingTypestring Type of embedding to construct (default='largeVis'). Possible values are: 'largeVis', 'tSNE', 'FR' (Fruchterman<U+2013>Reingold), 'UMAP', 'UMAP_graph'
namestring Name of the embedding (default=NULL). If NULL, the name = embeddingType.
dimsinteger Parameter 'dims' Matrix::sparseMatrix(); a non-negative, integer, dimensions vector of length 2 (default=2). See Matrix package documentation for more details.
Mnumeric (largeVis) The number of negative edges to sample for each positive edge (default=5). Parameter only used if embeddingType is 'largeVis'.
gammanumeric (largeVis) The strength of the force pushing non-neighbor nodes apart (default=7). Parameter only used if embeddingType is 'largeVis'.
perplexitynumeric Parameter 'perplexity' within largeVis::buildWijMatrix() (default=50). Please see the largeVis documentation for more details.
verboseboolean Whether to give verbose output (default=TRUE)
sgd_batchesnumeric The number of edges to process during SGD (default=NULL). Passed to projectKNNs(). Defaults to a value set based on the size of the dataset. If the parameter given is
between 0 and 1, the default value will be multiplied by the parameter.
diffusion.stepsinteger Iteration steps to use. If 0, no steps are run. (default=0)
diffusion.powernumeric Factor to be used when calculating diffusion, (default=0.5)
distancestring 'pearson', 'spearman', 'euclidean', 'L2', 'JS' (default='pearson')
n.coresnumeric Number of cores to use (default=1)
n.sgd.coresnumeric Number of cores to use (default=n.cores)
...Additional parameters passed to embedding functions, Rtsne::Rtsne() if 'L2', uwot::umap() if 'UMAP', embedKnnGraphUmap() if 'UMAP_graph'
Returns
embedding stored in self$embedding
Examples
\donttest{
## Load pre-generated a dataset of 3000 bone marrow cells as matrix
cm <- p2data::sample_BM1
## Perform QC, i.e. filter any cells that
counts <- gene.vs.molecule.cell.filter(cm,min.cell.size=500)
rownames(counts) <- make.unique(rownames(counts))
## Generate Pagoda2 object 
p2_object <- Pagoda2$new(counts,log.scale=TRUE, min.cells.per.gene=10, n.cores=1) 
## Normalize gene expression variance 
p2_object$adjustVariance(plot=TRUE, gam.k=10)
## Reduce the dataset dimensions by running PCA
p2_object$calculatePcaReduction(nPcs=50, n.odgenes=3e3)
## Generate a kNN graph of cells that will allow us to identify clusters of cells
p2_object$makeKnnGraph(k=40, type='PCA', center=TRUE, distance='cosine')
## Call clusters based on kNN
p2_object$getKnnClusters(method=infomap.community, type='PCA')
## Generate embedding of the data
p2_object$getEmbedding(type='PCA', embeddingType = 'UMAP', M=30, perplexity=30, gamma=1/30)
}Method clone()
The objects of this class are cloneable with this method.
Usage
Pagoda2$clone(deep = FALSE)
Arguments
- deep
- Whether to make a deep clone.