twosample_power: Power Estimation for Multivariate Two-Sample Tests

Description

Estimate the power of various two sample tests using Rcpp and parallel computing.

Usage

twosample_power(
  f,
  ...,
  TS,
  TSextra,
  alpha = 0.05,
  B = 1000,
  nbins = c(5, 5),
  minexpcount = 5,
  Ranges = matrix(c(-Inf, Inf, -Inf, Inf), 2, 2),
  samplingmethod = "Binomial",
  rnull,
  With.p.value = FALSE,
  DoTransform = TRUE,
  SuppressMessages = FALSE,
  LargeSampleOnly = FALSE,
  maxProcessor,
  doMethods = "all"
)

Value

A numeric matrix or vector of power values.

Arguments

f: function to generate a list with data sets x and y for continuous data or a matrix with columns vals_x, vals_y, x and y for discrete data.
...: additional arguments passed to f, up to 2.
TS: routine to calculate test statistics for new tests.
TSextra: additional info passed to TS, if necessary.
alpha: =0.05, the type I error probability of the hypothesis test.
B: =1000, number of simulation runs.
nbins: =c(5, 5), number of bins for chi square test if Dim=2.
minexpcount: =5, lowest required count for chi-square test.
Ranges: =matrix(c(-Inf, Inf, -Inf, Inf),2,2), a 2x2 matrix with lower and upper bounds.
samplingmethod: ="Binomial" for Binomial sampling or "independence" for independence sampling in the discrete data case.
rnull: function to generate new data sets for parametric bootstrap.
With.p.value: =FALSE, does user supplied routine return p values?
DoTransform: =TRUE, should data be transformed to to unit hypercube?
SuppressMessages: =FALSE, should messages be printed?
LargeSampleOnly: =FALSE, should only methods with large sample theories be run?
maxProcessor: number of cores to use. If missing the number of physical cores-1 is used. If set to 1 no parallel processing is done.
doMethods: ="all", which methods should be included?

Details

For details consult vignette("MD2sample","MD2sample")

Examples

Run this code

#Note that the resulting power estimates are meaningless because
#of the extremely low number of simulation runs B, required because of CRAN timing rule
#
#Power of tests when one data set comes from a standard normal multivariate distribution function
#and the other data set from a multivariate normal with correlation
#number of simulation runs is ridiculously small because of CRAN submission rules
f=function(a=0) {
 S=diag(2) 
 x=mvtnorm::rmvnorm(100, sigma = S)
 S[1,2]=a
 S[2,1]=a
 y=mvtnorm::rmvnorm(120, sigma = S)
 list(x=x, y=y)
}
twosample_power(f, c(0, 0.5), B=10, maxProcessor=1)
#Power of use supplied test. Example is a (included) chi-square test:
TSextra=list(which="statistics", nbins=rbind(c(3,3), c(4,4)))
twosample_power(f, c(0, 0.5), TS=chiTS.cont, TSextra=TSextra, B=10, maxProcessor=1)
#Same example, but this time the user supplied routine calculates p values:
TSextra=list(which="pvalues", nbins=c(4,4))
twosample_power(f, c(0, 0.5), TS=chiTS.cont, TSextra=TSextra, B=10, 
             With.p.value=TRUE, maxProcessor=1)
#Example for discrete data
g=function(p1, p2) {
  x = table(sample(1:4, size=1000, replace = TRUE))
  y = table(sample(1:4, size=500, replace = TRUE, prob=c(p1,p2,1,1)))
  cbind(vals_x=rep(1:2,2),  vals_y=rep(1:2, each=2), x=x, y=y)
}  
twosample_power(g, 1.5, 1.6, B=10, maxProcessor=1)

Run the code above in your browser using DataLab