computeGeneGSProb: CoGAPS gene membership statistic

Description

Computes the p-value for gene set membership using the CoGAPS-based statistics developed in Fertig et al. (2012). This statistic refines set membership for each candidate gene in a set specified in GSGenes by comparing the inferred activity of that gene to the average activity of the set. Specifically, we compute the following summary statistic for each gene $g$ that is a candidate member of gene set $G$: $$S_{g,G} = (\sum_{p} -log(Pr_{G,p})Pw[p](A_{gp}/\sigma_{gp})) / \sum_{p}-log(Pr_{G,p})Pw[p],$$ where $p$ indexes each of the patterns, $Pr_{G,p}$ is the probability that gene set $G$ is upregulated computed with calcCoGAPSStat, $A_{gp}$ is the mean amplitude matrix from the GAPS matrix factorization, Pw[p] is a prior weighting for each pattern based upon the context to which that pattern relates, and $\sigma_{gp}$ is the standard deviation of the amplitude matrix. P-values are formulated from a permutation test comparing the value of $S_{g,G}$ for genes in GSGenes relative to the value of $S_{g,G}$ numPerm random gene sets with the same number of targets.

Usage

computeGeneGSProb(Amean, Asd, GSGenes, Pw=rep(1,ncol(Amean)),numPerm=500,PwNull=F)

Arguments

Amean

Sampled mean value of the amplitude matrix ${\bf{A}}$. row.names(Amean) must correspond to the gene names contained in GSGenes.

Asd

Sampled standard deviation of the amplitude matrix ${\bf{A}}$.

GSGenes

Vector containing the prior estimate of members of the gene set of interest.

Vector containing the weight to assign each pattern in the gene statistic assumed to be computed from the association of the pattern with samples in a given context (optional: default=1 giving all patterns equal weight).

numPerm

Number of permuations used for the null distribution in the gene set statistic. (optional; default=500)

PwNull

Logical value. If TRUE, use pattern weighting in Pw when computing the null distribution for the statistic. If FALSE, do not use the pattern weighting so that the null is context independent. (optional; default=F)

Value

A vector of length GSGenes containing the p-values of set membership for each gene containined in the set specified in GSGenes.

References

E.J. Fertig, A.V. Favorov, and M.F. Ochs (2013) Identifying context-specific transcription factor targets from prior knowledge and gene expression data. 2012 IEEE Nanobiosciences.

Examples

Run this code

#################################################
# Results for GIST data in Fertig et al. (2012) #
#################################################

# load the data
data('GIST_TS_20084')
data('TFGSList')

# define transcription factors of interest based on Ochs et al. (2009)
TFs <- c("c.Jun", 'NF.kappaB', 'Smad4', "STAT3", "Elk.1", "c.Myc", "E2F.1", 
         "AP.1", "CREB", "FOXO", "p53", "Sp1")

# run the GAPS matrix factorization
nIter <- 10000
results <- CoGAPS(GIST.D, GIST.S, tf2ugFC,
                  nFactor=5,
                  nEquil=nIter, nSample=nIter,
                  plot=FALSE)

# set membership statistics
permTFStats <- list()
for (tf in TFs) {
     genes <- levels(tf2ugFC[,tf])
     genes <- genes[2:length(genes)]
     permTFStats[[tf]] <- computeGeneTFProb(Amean = GISTResults$Amean, 
                                            Asd = GistResults$Asd, genes) 
}