Learn R Programming

MVR (version 1.10.0)

mvrt.test: Function for Computing Mean-Variance Regularized T-test Statistic and Its Significance

Description

End-user function for computing MVR t-test statistic and its significance (p-value) under sample group homoscedasticity or heteroscedasticity assumption. Return an object of class "mvrt.test". Offers the option of parallel computation for improved efficiency. See vignette Cluster_Setup.pdf.

Usage

mvrt.test(data, 
              obj=NULL,
              block,
              tolog = FALSE, 
              nc.min = 1, 
              nc.max = 30, 
              pval = FALSE, 
              replace = FALSE, 
              n.resamp = 100, 
              parallel = FALSE,
              conf = NULL,
              verbose = TRUE)

Arguments

data
numeric matrix of untransformed (raw) data, where samples are by rows and variables (to be clustered) are by columns, or an object that can be coerced to such a matrix (such as a n
obj
Object of class "mvr" returned by mvr.
block
character or numeric vector or factor grouping/blocking variable of length the sample size. (see details).
tolog
logical scalar. Is the data to be log2-transformed first? Optional, defaults to FALSE. Note that negative or null values will be changed to 1 before taking log2-transformation.
nc.min
Positive integer scalar of the minimum number of clusters, defaults to 1
nc.max
Positive integer scalar of the maximum number of clusters, defaults to 30
pval
logical scalar. Shall p-values be computed? If not, n.resamp and replace will be ignored. If FALSE (default), t-statistic only will be computed, If TRUE, exact
replace
logical scalar. Shall permutation test (default) or bootstrap test be computed? If FALSE (default), permutation test will be computed with null permutation distribution, If TRUE, boo
n.resamp
Positive integer scalar of the number of resamplings to compute (default=100) by permutation or bootstsrap (see details).
parallel
logical scalar. Is parallel computing to be performed? Optional, defaults to FALSE.
conf
list of parameters for cluster configuration. Inputs for R package snow function makeCluster (R package snow) for cluster setup. Optional, defaults to NULL. See detai
verbose
logical scalar. Is the output to be verbose? Optional, defaults to TRUE.

Value

  • statisticvector, of size the number of variables, where entries are the t-statistics values of each variable.
  • p.valuevector, of size the number of variables, where entries are the p-values (if requested, otherwise NULL value) of each variable.

Details

Argument block is a vector or a factor grouping/blocking variable. It must be of length sample size with as many different character or numeric values as the number of levels or sample groups. The number of sample groups must be greater or equal to 2, and all group sample sizes must be greater than 1, otherwise the program will stop. Argument nc.max currently defaults to 30. We found in our hands that this is enough for most datasets tested. This depends on (i) the dimensionality/sample size ratio $\frac{p}{n}$, (ii) the signal/noise ratio, and (iii) whether a pre-transformation has been applied (see Dazard, J-E. and J. S. Rao (2012) for more details). See the cluster diagnostic function cluster.diagnostic for more details, whether larger values of nc.max may be required.

Argument n.resamp is reset to conf$cpus*ceiling(n.resamp/conf$cpus) in case the Rocks cluster is used (i.e. conf is non NULL), where conf$cpus denotes the total number of CPUs to be used (see below).

To save un-necessary computations, previously computed MVR clustering can be provided through option obj (i.e. obj is fully specified as a mvr object). In this case, arguments data, block, tolog, nc.min, nc.max are ignored. If obj is fully specified (i.e. an object of class "mvr" returned by mvr), the the MVR clustering provided by obj will be used for the computation of the regularized t-test statistics. If obj=NULL, a MVR clustering computation for the regularized t-test statistics and/or p-values will be performed. To run a parallel session (and parallel RNG) of the MVR procedures (parallel=TRUE), argument conf is to be specified (i.e. non NULL). It must list the specifications of the folowing parameters for cluster configuration: ("cpus", "type", "homo", "script", "outfile") matching the arguments and options described in function makeCluster of the R package snow:

  • "cpus":spec:integerscalar specifying the total number of CPU cores, counting the masternode.
  • "type":type:charactervector specifying the cluster type ("SOCK", "PVM", "MPI").
  • "homo":homogeneous:logicalscalar to be set toFALSEfor inhomogeneous clusters. (optional, defaults toTRUE).
  • "script":useRscript:logicalscalar to be set toFALSEif non-R script is used. (optional, defaults toTRUE).
  • "outfile":outfile:charactervector of the output log file name for the slavenodes. (optional, defaults to"").
Note that the actual creation of the cluster, its initialization, and closing are all done internally. In addition, when random number generation is needed, the creation of separate Stream of Parallel RNG (SPRNG) per node is done internally by distributing the stream states to the nodes (For more details see vignette Cluster_Setup.pdf and see function makeCluster (R package snow) and/or http://www.stat.uiowa.edu/~luke/R/cluster/cluster.html. In case p-values are desired (pval=TRUE), the use of the cluster is highly recommended. It is ideal for computing embarassingly parallel tasks such as permutation or bootstrap resamplings. Note that in case both regularized t-test statistics and p-values are desired, in order to maximize computational efficiency and avoid multiple configurations (since a cluster can only be configured and used one session at a time, which otherwise would result in a run stop), the cluster configuration will only be used for the parallel computation of p-values, but not for the MVR clustering computation of the regularized t-test statistics.

References

  • Dazard J-E., Hua Xu and J. S. Rao (2011). "R package MVR for Joint Adaptive Mean-Variance Regularization and Variance Stabilization." In JSM Proceedings, Section for Statistical Programmers and Analysts. Miami Beach, FL, USA: American Statistical Association IMS - JSM. (in press).
  • Dazard J-E. and J. S. Rao (2012). "Joint Adaptive Mean-Variance Regularization and Variance Stabilization of High Dimensional Data." Comput. Statist. Data Anal. (to appear).

See Also

  • makeCluster(R packagesnow) Simple Network of Workstations
  • eBayes(R packagelimma) Bayesian Regularized t-test statisticSmyth, 2004
  • samr(R packagesamr) SAM Regularized t-test statisticTusher et al., 2001, Storey, 2003
  • matest(R packagemaanova) James-Stein shrinkage estimator-based Regularized t-test statisticCui et al., 2005
  • ebam(R packagesiggenes) Empirical Bayes Regularized z-test statisticEfron, 2001
  • bayesTHierarchical Bayesian Regularized t-test statisticBaldi et al., 2001

Examples

Run this code
#================================================
# Loading the library and its dependencies
#================================================
library("MVR")
require("statmod", quietly = TRUE)
require("snow", quietly = TRUE)
require("RColorBrewer", quietly = TRUE)

#================================================
# MVR package news
#================================================
MVR.news()

#================================================
# MVR package citation
#================================================
citation("MVR")

#================================================
# Loading of the Synthetic and Real datasets 
# (see description of datasets)
#================================================
data("Synthetic", "Real", package="MVR")

#================================================
# Regularized t-test statistics (Synthetic dataset) 
# Multi-Group Assumption
# Assuming unequal variance between groups
# With option to use prior MVR clustering results
# Without computation of p-values
# Without Rocks cluster usage
#================================================
nc.min <- 1
nc.max <- 20
probs <- seq(0, 1, 0.01)
n <- 10
GF <- factor(gl(n = 2, k = n/2, len = n), 
             ordered = FALSE, 
             labels = c("G1", "G2"))
mvr.obj <- mvr(data = Synthetic, 
               block = GF, 
               tolog = FALSE, 
               nc.min = nc.min, 
               nc.max = nc.max, 
               probs = probs,
               B = 100,
               parallel = FALSE, 
               conf = NULL,
               verbose = TRUE)
mvrt.obj <- mvrt.test(obj = mvr.obj, 
                      pval = FALSE, 
                      parallel = FALSE, 
                      conf = NULL,
                      verbose = TRUE)
                      
#===================================================
if (.Platform$OS.type == "unix") {
    if (!is.loaded("rpvm")) {
            library("rpvm")
    }
}
masterhost <- Sys.getenv("HOSTNAME")
slavehosts <- unlist(strsplit(Sys.getenv("HOSTS"), split="\n"))
.PVM.start.pvmd(hosts = masterhost) 
.PVM.addhosts(hosts = slavehosts)
#===================================================

#===================================================
cpus <- as.numeric(Sys.getenv("NUMCPU"))
nodes <- length(slavehosts) + 1
conf <- list("cpus" = nodes * cpus, 
             "type" = getClusterOption("type"), 
             "homo" = getClusterOption("homogeneous"), 
             "script" = getClusterOption("useRscript"),
             "outfile" = "")
#===================================================

#===================================================
# Mean-Variance Regularization (Real dataset)
# Multi-Group Assumption
# Assuming unequal variance between groups
# With Rocks cluster usage
#===================================================
nc.min <- 1
nc.max <- 30
probs <- seq(0, 1, 0.01)
n <- 6
GF <- factor(gl(n = 2, k = n/2, len = n), 
             ordered = FALSE, 
             labels = c("M", "S"))
mvr.obj <- mvr(data = Real, 
               block = GF, 
               tolog = FALSE, 
               nc.min = nc.min, 
               nc.max = nc.max, 
               probs = probs,
               B = 100, 
               parallel = TRUE, 
               conf = conf,
               verbose = TRUE)

#===================================================
# Regularized t-test statistics (Real dataset) 
# Multi-Group Assumption
# Assuming unequal variance between groups
# With option to use prior MVR clustering results
# With computation of p-values
# With Rocks cluster usage
#===================================================
mvrt.obj <- mvrt.test(obj = mvr.obj, 
                      pval = TRUE, 
                      replace = FALSE, 
                      n.resamp = 100, 
                      parallel = TRUE, 
                      conf = conf,
                      verbose = TRUE)

#===================================================
.PVM.delhosts(hosts = slavehosts)
.PVM.delhosts(hosts = masterhost)
.PVM.halt()
#===================================================

Run the code above in your browser using DataLab