calculatePEnrichment: Compute P_enrichment

Description

This function performs a pre-ranked gene set enrichment analysis (GSEA) to evaluate the degree to which a candidate gene set is overrepresented at the top or bottom extremes of a ranked list of concordance indices. This function is normally called by saps.

Usage

calculatePEnrichment(rankedGenes, candidateGeneSet, cpus, gsea.perm = 1000)

Arguments

rankedGenes

An nx1 matrix of concordance indices for n genes. Generally this will be the z-score returned by rankConcordance. The row names should contain gene identifiers.

candidateGeneSet

A 1xp matrix of p gene identifiers. The row name should contain a name for the gene set.

cpus

This value is passed to the runGSA function in the piano package. For multi-core CPUs, this value should be set to the number of cores (which will significantly improve the computational time).

gsea.perm

The number of permutations to be used in the GSEA. This value is passed to runGSA.

Value

The function returns a matrix with the following columns:
P_enrichmentthe enrichment score
directioneither 1 or -1 depending on the direction of association

References

Beck AH, Knoblauch NW, Hefti MM, Kaplan J, Schnitt SJ, et al. (2013) Significance Analysis of Prognostic Signatures. PLoS Comput Biol 9(1): e1002875.doi:10.1371/journal.pcbi.1002875

Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, et al. (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 102: 15545-15550.

Examples

Run this code

# 25 patients, none lost to followup
followup <- rep(1, 25)

# first 5 patients have good survival (in days)
time <- c(25, 27, 24, 21, 26, sample(1:3, 20, TRUE))*365

# create data for 100 genes, 25 patients
dat <- matrix(rnorm(25*100), nrow=25, ncol=100)
colnames(dat) <- as.character(1:100)

# create two random genesets of 5 genes each
set1 <- sample(colnames(dat), 5)
set2 <- sample(colnames(dat), 5)

genesets <- rbind(set1, set2)

# tweak data for first 5 patients for set1
dat[1:5, set1] <- dat[1:5, set1]+10

# rank all genes by concordance index
ci <- rankConcordance(dat, time, followup)[,"z"]

# set1 should achieve significance
p_enrich <- calculatePEnrichment(ci, genesets["set1",,drop=FALSE], cpus=1)
p_enrich

# set2 should not
p_enrich <- calculatePEnrichment(ci, genesets["set2",,drop=FALSE], cpus=1)
p_enrich