Learn R Programming

CoGAPS (version 2.6.0)

CoGAPS: CoGAPS calls the C++ MCMC code through gapsRun and performs Bayesian matrix factorization returning the two matrices that reconstruct the data matrix and then calls calcCoGAPSStat to estimate gene set activity with nPerm set to 500

Description

CoGAPS calls the C++ MCMC code through gapsRun and performs Bayesian matrix factorization returning the two matrices that reconstruct the data matrix and then calls calcCoGAPSStat to estimate gene set activity with nPerm set to 500

Usage

CoGAPS(data, unc, ABins = data.frame(), PBins = data.frame(), GStoGenes,
  nFactor = 7, simulation_id = "simulation", nEquil = 1000,
  nSample = 1000, nOutR = 1000, output_atomic = FALSE,
  fixedBinProbs = FALSE, fixedDomain = "N", sampleSnapshots = TRUE,
  numSnapshots = 100, plot = TRUE, nPerm = 500, alphaA = 0.01,
  nMaxA = 1e+05, max_gibbmass_paraA = 100, alphaP = 0.01, nMaxP = 1e+05,
  max_gibbmass_paraP = 100)

Arguments

data
data matrix
unc
uncertainty matrix (std devs for chi-squared of Log Likelihood)
ABins
a matrix of same size as A which gives relative probability of that element being non-zero
PBins
a matrix of same size as P which gives relative probability of that element being non-zero
GStoGenes
data.frame or list with gene sets
nFactor
number of patterns (basis vectors, metagenes)
simulation_id
name to attach to atoms files if created
nEquil
number of iterations for burn-in
nSample
number of iterations for sampling
nOutR
how often to print status into R by iterations
output_atomic
whether to write atom files (large)
fixedBinProbs
Boolean for using relative probabilities given in Abins and Pbins
fixedDomain
character to indicate whether A or P is domain for relative probabilities
sampleSnapshots
Boolean to indicate whether to capture individual samples from Markov chain during sampling
numSnapshots
the number of individual samples to capture
plot
Boolean to indicate whether to produce output graphics
nPerm
number of permutations in gene set test
alphaA
sparsity parameter for A domain
nMaxA
PRESENTLY UNUSED, future = limit number of atoms
max_gibbmass_paraA
limit truncated normal to max size
alphaP
sparsity parameter for P domain
nMaxP
PRESENTLY UNUSED, future = limit number of atoms
max_gibbmass_paraP
limit truncated normal to max size

Value

  • A list containing:
  • meanChi2Value of $chi^2$ for Amean and Pmean.
  • DData matrix ${\bf{D}}$ input to factorization.
  • Sigmauncertainty matrix (std devs for chi-squared of Log Likelihood)
  • AmeanSampled mean value of the amplitude matrix ${\bf{A}}$.
  • AsdSampled standard deviation of the amplitude matrix ${\bf{A}}$.
  • PmeanSampled mean value of the amplitude matrix ${\bf{P}}$.
  • PsdSampled standard deviation of the amplitude matrix ${\bf{P}}$.
  • GSUpregp-values for upregulation of each gene set in each pattern.
  • GSDownregp-values for downregulation of each gene set in each pattern.
  • GSActEstp-values for activity of each gene set in each pattern.

Details

CoGAPS first decomposes the data matrix using GAPS, ${\bf{D}}$, into a basis of underlying patterns and then determines the gene set activity in each of these patterns. The GAPS decomposition is achieved by finding amplitude and pattern matrices (${\bf{A}}$ and ${\bf{P}}$, respectively) for which $${\bf{D}} = {\bf{A}}{\bf{P}} + \Sigma,$$ where $\Sigma$ is the matrix of uncertainties given by unc. The matrices $\bf{A}$ and $\bf{P}$ are assumed to have the atomic prior described in Sibisi and Skilling (1997) and are found with MCMC sampling. Then, the patterns identified in the columns of $\bf{P}$ are linked to activity in each of the gene sets specified in GStoGenes using a novel z-score based statistic developed in Ochs et al. (2009). Specifically, the z-score for pattern $p$ and gene set $G_{i}$ containing $G$ total genes is given by $$Z_{i,p} = \frac{1}{G} \sum_{g in \mathcal{G_{i}}} {\frac{{\bf{A}_{gp}}}{Asd_{gp}}},$$ where $g$ indexes the genes in the set and $Asd_{gp}$ is the standard deviation of ${\bf{A}}_{gp}$ obtained from MCMC sampling. CoGAPS then uses the specified nPerm random sample tests to compute a consistent p value estimate from that z score. Note that the data from Ochs et al. (2009) are provided with this package in GIST_TS_20084.RData and TFGSList.RData are also provided with this package for further validation.

See Also

gapsRun,calcCoGAPSStat

Examples

Run this code
## Load data
nIter <- 5000

## Run GAPS matrix decomposition with gene set statistic
results <- CoGAPS(data=SimpSim.D, unc=SimpSim.S,
                  GStoGenes=GSets,
                  nFactor=3,
                  nEquil=nIter, nSample=nIter,
                  plot=FALSE)


## Plot the results
plotGAPS(results$Amean, results$Pmean, 'GSFigs')

Run the code above in your browser using DataLab