calculatePRandom: Compute P_random

Description

This function randomly samples gene sets, and calculates P_pure (via calculatePPure) for each one. P_random is the proportion of randomly sampled gene sets achieving a P_pure at least as significant as the provided p_pure. This function is normally called by saps.

Usage

calculatePRandom(dataSet, sampleSize, p_pure, survivalTimes, followup,
  random.samples = 10000)

Arguments

dataSet

A matrix, where the column names are gene identifiers and the values are gene expression levels. Each row should contain data for a single patient.

sampleSize

The desired size for the randomly sampled gene sets.

p_pure

The candidate P_pure against which to compare the P_pure values for the randomly generated gene sets.

survivalTimes

A vector of survival times. The length must equal the number of rows in dataSet.

followup

A vector of 0 or 1 values, indicating whether the patient was lost to followup (0) or not (1). The length must equal the number of rows (i.e. patients) in dataSet.

random.samples

The number of random gene sets to sample.

Value

A list with the following elements:
p_randomThe proportion of randomly sampled gene sets with a calculated p_pure at least as significant as the provided p_pure.
p_puresA vector of calculated p_pure values for each randomly generated geneset.

References

Beck AH, Knoblauch NW, Hefti MM, Kaplan J, Schnitt SJ, et al. (2013) Significance Analysis of Prognostic Signatures. PLoS Comput Biol 9(1): e1002875.doi:10.1371/journal.pcbi.1002875

Examples

Run this code

# 25 patients, none lost to followup
followup <- rep(1, 25)

# first 5 patients have good survival (in days)
time <- c(25, 27, 24, 21, 26, sample(1:3, 20, TRUE))*365

# create data for 100 genes, 25 patients
dat <- matrix(rnorm(25*100), nrow=25, ncol=100)
colnames(dat) <- as.character(1:100)

# relatively low threshold
p_pure <- 0.05

p_random <- calculatePRandom(dat, 5, p_pure, time, followup, random.samples=100)
p_random$p_random
hist(p_random$p_pures)
length(p_random$p_pures[p_random$p_pures <= p_pure])

# set a more stringent threshold
p_pure <- 0.001

p_random <- calculatePRandom(dat, 5, p_pure, time, followup, random.samples=100)
length(p_random$p_pures[p_random$p_pures <= p_pure])