mt.maxT: Step-down maxT and minP multiple testing procedures

Description

These functions compute permutation adjusted $p$-values for step-down multiple testing procedures described in Westfall & Young (1993).

Usage

mt.maxT(X,classlabel,test="t",side="abs",fixed.seed.sampling="y",B=10000,na=.mt.naNUM,nonpara="n")
mt.minP(X,classlabel,test="t",side="abs",fixed.seed.sampling="y",B=10000,na=.mt.naNUM,nonpara="n")

Arguments

A data frame or matrix, with $m$ rows corresponding to variables (hypotheses) and $n$ columns to observations. In the case of gene expression data, rows correspond to genes and columns to mRNA samples. The data can be read using read.table.

classlabel

A vector of integers corresponding to observation (column) class labels. For $k$ classes, the labels must be integers between 0 and $k-1$. For the blockf test option, observations may be divided into $n/k$ blocks of $k$ observations each. The observations are ordered by block, and within each block, they are labeled using the integers 0 to $k-1$.

test

A character string specifying the statistic to be used to test the null hypothesis of no association between the variables and the class labels. If test="t", the tests are based on two-sample Welch t-statistics (unequal variances). If test="t.equalvar", the tests are based on two-sample t-statistics with equal variance for the two samples. The square of the t-statistic is equal to an F-statistic for $k=2$. If test="wilcoxon", the tests are based on standardized rank sum Wilcoxon statistics. If test="f", the tests are based on F-statistics. If test="pairt", the tests are based on paired t-statistics. The square of the paired t-statistic is equal to a block F-statistic for $k=2$. If test="blockf", the tests are based on F-statistics which adjust for block differences (cf. two-way analysis of variance).

side

A character string specifying the type of rejection region. If side="abs", two-tailed tests, the null hypothesis is rejected for large absolute values of the test statistic. If side="upper", one-tailed tests, the null hypothesis is rejected for large values of the test statistic. If side="lower", one-tailed tests, the null hypothesis is rejected for small values of the test statistic.

fixed.seed.sampling

If fixed.seed.sampling="y", a fixed seed sampling procedure is used, which may double the computing time, but will not use extra memory to store the permutations. If fixed.seed.sampling="n", permutations will be stored in memory. For the blockf test, the option n was not implemented as it requires too much memory.

The number of permutations. For a complete enumeration, B should be 0 (zero) or any number not less than the total number of permutations.

Code for missing values (the default is .mt.naNUM=--93074815.62). Entries with missing values will be ignored in the computation, i.e., test statistics will be based on a smaller sample size. This feature has not yet fully implemented.

nonpara

If nonpara="y", nonparametric test statistics are computed based on ranked data. If nonpara="n", the original data are used.

Value

index: Vector of row indices, between 1 and nrow(X), where rows are sorted first according to their adjusted $p$-values, next their unadjusted $p$-values, and finally their test statistics.
teststat: Vector of test statistics, ordered according to index. To get the test statistics in the original data order, use teststat[order(index)].
rawp: Vector of raw (unadjusted) $p$-values, ordered according to index.
adjp: Vector of adjusted $p$-values, ordered according to index.
plower: For mt.minP function only, vector of "adjusted $p$-values", where ties in the permutation distribution of the successive minima of raw $p$-values with the observed $p$-values are counted only once. Note that procedures based on plower do not control the FWER. Comparison of plower and adjp gives an idea of the discreteness of the permutation distribution. Values in plower are ordered according to index.

Details

These functions compute permutation adjusted $p$-values for the step-down maxT and minP multiple testing procedures, which provide strong control of the family-wise Type I error rate (FWER). The adjusted $p$-values for the minP procedure are defined in equation (2.10) p. 66 of Westfall & Young (1993), and the maxT procedure is discussed p. 50 and 114. The permutation algorithms for estimating the adjusted $p$-values are given in Ge et al. (In preparation). The procedures are for the simultaneous test of $m$ null hypotheses, namely, the null hypotheses of no association between the $m$ variables corresponding to the rows of the data frame X and the class labels classlabel. For gene expression data, the null hypotheses correspond to no differential gene expression across mRNA samples.

References

S. Dudoit, J. P. Shaffer, and J. C. Boldrick (Submitted). Multiple hypothesis testing in microarray experiments.

Y. Ge, S. Dudoit, and T. P. Speed. Resampling-based multiple testing for microarray data hypothesis, Technical Report \#633 of UCB Stat. http://www.stat.berkeley.edu/~gyc

P. H. Westfall and S. S. Young (1993). Resampling-based multiple testing: Examples and methods for $p$-value adjustment. John Wiley \& Sons.

Examples

Run this code

# Gene expression data from Golub et al. (1999)
# To reduce computation time and for illustrative purposes, we condider only
# the first 100 genes and use the default of B=10,000 permutations.
# In general, one would need a much larger number of permutations
# for microarray data.

data(golub)
smallgd<-golub[1:100,] 
classlabel<-golub.cl

# Permutation unadjusted p-values and adjusted p-values 
# for maxT and minP procedures with Welch t-statistics
resT<-mt.maxT(smallgd,classlabel)
resP<-mt.minP(smallgd,classlabel)
rawp<-resT$rawp[order(resT$index)]
teststat<-resT$teststat[order(resT$index)]

# Plot results and compare to Bonferroni procedure
bonf<-mt.rawp2adjp(rawp, proc=c("Bonferroni"))
allp<-cbind(rawp, bonf$adjp[order(bonf$index),2], resT$adjp[order(resT$index)],resP$adjp[order(resP$index)])

mt.plot(allp, teststat, plottype="rvsa", proc=c("rawp","Bonferroni","maxT","minP"),leg=c(0.7,50),lty=1,col=1:4,lwd=2)
mt.plot(allp, teststat, plottype="pvsr", proc=c("rawp","Bonferroni","maxT","minP"),leg=c(60,0.2),lty=1,col=1:4,lwd=2)
mt.plot(allp, teststat, plottype="pvst", proc=c("rawp","Bonferroni","maxT","minP"),leg=c(-6,0.6),pch=16,col=1:4)

# Permutation adjusted p-values for minP procedure with F-statistics (like equal variance t-statistics)
mt.minP(smallgd,classlabel,test="f",fixed.seed.sampling="n")

# Note that the test statistics used in the examples below are not appropriate 
# for the Golub et al. data. The sole purpose of these examples is to 
# demonstrate the use of the mt.maxT and mt.minP functions.

# Permutation adjusted p-values for maxT procedure with paired t-statistics
classlabel<-rep(c(0,1),19)
mt.maxT(smallgd,classlabel,test="pairt")

# Permutation adjusted p-values for maxT procedure with block F-statistics
classlabel<-rep(0:18,2)
mt.maxT(smallgd,classlabel,test="blockf",side="upper")

Run the code above in your browser using DataLab