Learn R Programming

CancerMutationAnalysis (version 1.14.0)

cma.set.sim: Simulates data and performs gene-set analysis methods on the simulated datasets.

Description

This function simulates data under the passenger or permutation null, either under the null or including spiked-in gene-sets. It then calculates the p-values and q-values for all the selected gene-set analysis methods.

Usage

cma.set.sim(cma.alter, cma.cov, cma.samp, GeneSets, passenger.rates = t(data.frame(0.55*rep(1.0e-6,25))), ID2name=NULL, BH = TRUE, nr.iter, pass.null = FALSE, perc.samples = NULL, spiked.set.sizes = NULL, gene.method = FALSE, perm.null.method = TRUE, perm.null.het.method = FALSE, pass.null.method = FALSE, pass.null.het.method = FALSE, show.iter, KnownMountains = c("EGFR","SMAD4","KRAS", "TP53","CDKN2A","MYC","MYCN","PTEN","RB1"), exclude.mountains=TRUE, verbose=TRUE)

Arguments

cma.alter
Data frame with somatic mutation information, broken down by gene, sample, screen, and mutation type. See GeneAlterBreast for an example.
cma.cov
Data frame with the total number of nucleotides "at risk" ("coverage"), broken down by gene, screen, and mutation type. See GeneCovBreast for an example.
cma.samp
Data frame with the number of samples analyzed, broken down by gene and screen. See GeneSampBreast for an example.
GeneSets
An object which annotates genes to gene-sets; it can either be a list with each component representing a set, or an object of the class AnnDbBimap.
passenger.rates
Data frame with 1 row and 25 columns, of passenger mutation rates per nucleotide, by type, or "context". Columns denote types and must be in the same order as the first 25 columns in the MutationsBrain objects.
ID2name
Vector mapping the gene identifiers used in the GeneSets object to the gene names used in the other objects; if they are the same, this parameter is not needed. See EntrezID2Name for an example.
BH
If set to TRUE, uses the Benjamini-Hochberg method to get q-values; if set to FALSE, uses the Storey method from the qvalue package.
nr.iter
The number of iterations to be simulated.
pass.null
If set to true TRUE, implements the passenger null hypothesis, using the rates from passenger.rates; otherwise, implements the permutation null, permuting mutational events.
perc.samples
Vector representing the probabilities of the spiked-in gene-sets being altered in any given sample, as percentages; for example perc.samples = c(75, 90) means that these probabilities are 0.75 and 0.90.
spiked.set.sizes
Vector representing the sizes, in genes, of the spiked-in gene-sets; for example, if perc.samples = c(75, 90) and spiked.set.sizes = c(50, 100), there would be 4 spiked-in sets, one with 50 genes and probability of being altered of 0.75 in each sample, one with 50 genes and probability of being altered of 0.90 in each sample, one with 100 genes and probability of being altered of 0.75 in each sample, and one with 100 genes and probability of being altered of 0.90 in each sample.
gene.method
If set to TRUE, implements gene-oriented method.
perm.null.method
If set to TRUE, implements patient-oriented method with permutation null and no heterogeneity.
perm.null.het.method
If set to TRUE, implements patient-oriented method with permutation null and heterogeneity.
pass.null.method
If set to TRUE, implements patient-oriented method with passenger null and no heterogeneity.
pass.null.het.method
If set to TRUE, implements patient-oriented method with passenger null and heterogeneity.
show.iter
If set to TRUE and verbose is also set to TRUE, shows what simulation is currently running.
KnownMountains
Vector of genes to be excluded from the permutation null simulations if exclude.mountains = TRUE.
exclude.mountains
If set to TRUE, excludes the genes in KnownMountains.
verbose
If TRUE, prints intermediate messages.

Value

An object of the class SetMethodsSims. See SetMethodsSims for more details.

References

Boca SM, Kinzler KW, Velculescu VE, Vogelstein B, Parmigiani G. Patient-oriented gene-set analysis for cancer mutation data. Genome Biology. DOI: 10.1186/gb-2010-11-11-r112 Parmigiani G, Lin J, Boca S, Sjoeblom T, Kinzler KW, Velculescu VE, Vogelstein B. Statistical methods for the analysis of cancer genome sequencing data. http://www.bepress.com/jhubiostat/paper126/

Benjamini Y and Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society B, DOI: 10.2307/2346101

Storey JD and Tibshirani R. Statistical significance for genome-wide experimens. Proceedings of the National Academy of Sciences. DOI: 10.1073/pnas.1530509100 Parsons DW, Jones S, Zhang X, Lin JCH, Leary RJ, Angenendt P, Mankoo P, Carter H, Siu I, et al. An Integrated Genomic Analysis of Human Glioblastoma Multiforme. Science. DOI: 10.1126/science.1164382 Wood LD, Parsons DW, Jones S, Lin J, Sjoeblom, Leary RJ, Shen D, Boca SM, Barber T, Ptak J, et al. The Genomic Landscapes of Human Breast and Colorectal Cancer. Science. DOI: 10.1126/science.1145720

See Also

SetMethodsSims-class, CoverageBrain, EventsBySampleBrain, GeneSizes08, MutationsBrain, ID2name, cma.set.stat, extract.sims.method, combine.sims

Examples

Run this code
##Note that this takes a few minutes to run:
library(KEGG.db)
data(ParsonsGBM08)
data(EntrezID2Name)

setIDs <- c("hsa00250", "hsa05213")
set.seed(831984)
ResultsSim <- 
    cma.set.sim(cma.alter = GeneAlterGBM,
                      cma.cov = GeneCovGBM,
                      cma.samp = GeneSampGBM,
                      GeneSets =  KEGGPATHID2EXTID[setIDs],
                      ID2name = EntrezID2Name,
                      nr.iter = 2,
                      pass.null = TRUE,
                      perc.samples = c(75, 95),
                      spiked.set.sizes = 50,
                      perm.null.method = TRUE,
                      pass.null.method = TRUE)

ResultsSim

Run the code above in your browser using DataLab