Learn R Programming

scde (version 2.0.1)

pagoda.pathway.wPCA: Run weighted PCA analysis on pre-annotated gene sets

Description

For each valid gene set (having appropriate number of genes) in the provided environment (setenv), the method will run weighted PCA analysis, along with analogous analyses of random gene sets of the same size, or shuffled expression magnitudes for the same gene set.

Usage

pagoda.pathway.wPCA(varinfo, setenv, n.components = 2,
  n.cores = detectCores(), min.pathway.size = 10, max.pathway.size = 1000,
  n.randomizations = 10, n.internal.shuffles = 0, n.starts = 10,
  center = TRUE, batch.center = TRUE, proper.gene.names = NULL,
  verbose = 0)

Arguments

varinfo
adjusted variance info from pagoda.varinfo() (or pagoda.subtract.aspect())
setenv
environment listing gene sets (contains variables with names corresponding to gene set name, and values being vectors of gene names within each gene set)
n.components
number of principal components to determine for each gene set
n.cores
number of cores to use
min.pathway.size
minimum number of observed genes that should be contained in a valid gene set
max.pathway.size
maximum number of observed genes in a valid gene set
n.randomizations
number of random gene sets (of the same size) to be evaluated in parallel with each gene set (can be kept at 5 or 10, but should be increased to 50-100 if the significance of pathway overdispersion will be determined relative to random gene set models)
n.internal.shuffles
number of internal (independent row shuffles) randomizations of expression data that should be evaluated for each gene set (needed only if one is interested in gene set coherence P values, disabled by default; set to 10-30 to estimate)
n.starts
number of random starts for the EM method in each evaluation
center
whether the expression matrix should be recentered
batch.center
whether batch-specific centering should be used
proper.gene.names
alternative vector of gene names (replacing rownames(varinfo$mat)) to be used in cases when the provided setenv uses different gene names
verbose
verbosity level

Value

  • a list of weighted PCA info for each valid gene set

Examples

Run this code
data(pollen)
cd <- clean.counts(pollen)
knn <- knn.error.models(cd, k=ncol(cd)/4, n.cores=10, min.count.threshold=2, min.nonfailed=5, max.model.plots=10)
varinfo <- pagoda.varnorm(knn, counts = cd, trim = 3/ncol(cd), max.adj.var = 5, n.cores = 1, plot = FALSE)
# create go environment
library(org.Hs.eg.db)
# translate gene names to ids
ids <- unlist(lapply(mget(rownames(cd), org.Hs.egALIAS2EG, ifnotfound = NA), function(x) x[1]))
rids <- names(ids); names(rids) <- ids
go.env <- lapply(mget(ls(org.Hs.egGO2ALLEGS), org.Hs.egGO2ALLEGS), function(x) as.character(na.omit(rids[x])))
# clean GOs
go.env <- clean.gos(go.env)
# convert to an environment
go.env <- list2env(go.env)
pwpca <- pagoda.pathway.wPCA(varinfo, go.env, n.components=1, n.cores=10, n.internal.shuffles=50)

Run the code above in your browser using DataLab