NullDistribution: Specification of the Reference Distribution

Description

Specification of the asymptotic, approximative (Monte Carlo) and exact reference distribution.

Usage

asymptotic(maxpts = 25000, abseps = 0.001, releps = 0)
approximate(B = 10000, parallel = c("no", "multicore", "snow"),
            ncpus = 1, cl = NULL)
exact(algorithm = c("auto", "shift", "split-up"), fact = NULL)

Arguments

maxpts

an integer, the maximum number of function values; see GenzBretz in package mvtnorm. Defaults to 25000.

abseps

a numeric, the absolute error tolerance; see GenzBretz in package mvtnorm. Defaults to 0.001.

releps

a numeric, the relative error tolerance; see GenzBretz in package mvtnorm. Defaults to 0.

a positive integer, the number of Monte Carlo replicates used for the computation of the approximative reference distribution. Defaults to 10000.

parallel

a character, the type of parallel operation: either "no" (default), "multicore" or "snow".

ncpus

an integer, the number of processes to be used in parallel operation. Defaults to 1.

an object inheriting from class "cluster", specifying an optional parallel or snow cluster if parallel = "snow". Defaults to NULL.

algorithm

a character, the algorithm used for the computation of the exact reference distribution: either "auto" (default), "shift" or "split-up".

fact

an integer to multiply the response values with. Defaults to NULL.

encoding

UTF-8

Details

asymptotic, approximate and exact provide control of the specification of the asymptotic, approximative (Monte Carlo) and exact reference distributions, and are supplied to, e.g., independence_test via its distribution argument.

The asymptotic reference distribution is computed using randomised quasi-Monte Carlo methods (Genz and Bretz, 2009) and is applicable to arbitrary covariance structures and dimensions up to 1000. See package mvtnorm for details on maxpts, abseps and releps.

The approximative (Monte Carlo) reference distribution is always applicable. In this case, the distribution is obtained by a conditional Monte Carlo procedure, i.e., by computing the test statistic for B random samples from all admissible permutations of the response $\bf{Y}$ within each block (Hothorn et al., 2008). By default, the distribution is obtained through serial operation (parallel = "no"). The use of parallel operation is specified by setting parallel to either "multicore" (not available for MS Windows) or "snow". In the latter case, if cl = NULL (default) a cluster with ncpus processes is created on the local machine unless a default cluster has been registered (see setDefaultCluster in package parallel) in which case that gets used instead. Alternatively, the use of an optional parallel or snow cluster can be specified by cl. See Examples and package parallel for details on parallel operation.

The exact reference distribution is currently applicable to univariate two-sample problems only, using either the shift algorithm (Streitberg and Röhmel{Roehmel}, 1986, 1987) or the split-up algorithm (van de Wiel, 2001). The shift algorithm is defined for positive integer valued scores $h(\bf{Y})$ only, but handles blocks pertaining to, e.g., pre- and post-stratification. The split-up algorithm can be used with non-integer scores, but does not handle blocks. By default, an automatic choice is made (algorithm = "auto") but the shift and split-up algorithms can be selected by setting algorithm to either "shift" or "split-up" respectively.

References

Genz, A. and Bretz, F. (2009). Computation of Multivariate Normal and t Probabilities. Heidelberg: Springer-Verlag.

Hothorn, T., Hornik, K., van de Wiel, M. A. and Zeileis, A. (2008). Implementing a class of permutation tests: the coin package. Journal of Statistical Software 28(8), 1--23. http://www.jstatsoft.org/v28/i08/

Streitberg, B. and Röhmel{Roehmel}, J. (1986). Exact distributions for permutations and rank tests: an introduction to some recently published algorithms. Statistical Software Newsletter 12(1), 10--17.

Streitberg, B. and Röhmel{Roehmel}, J. (1987). Exakte verteilungen für{fuer} rang- und randomisierungstests im allgemeinen c-stichprobenfall. EDV in Medizin und Biologie 18(1), 12--19.

van de Wiel, M. A. (2001). The split-up algorithm: a fast symbolic method for computing p-values of distribution-free statistics. Computational Statistics 16(4), 519--538.

Examples

Run this code

## Approximative (Monte Carlo) Cochran-Mantel-Haenszel test

## Serial operation
set.seed(123)
cmh_test(disease ~ smoking | gender, data = alzheimer,
         distribution = approximate(B = 100000))

## Multicore with 8 processes (not for MS Windows)
set.seed(123, kind = "L'Ecuyer-CMRG")
cmh_test(disease ~ smoking | gender, data = alzheimer,
         distribution = approximate(B = 100000,
                                    parallel = "multicore", ncpus = 8))

## Automatic PSOCK cluster with 4 processes
set.seed(123, kind = "L'Ecuyer-CMRG")
cmh_test(disease ~ smoking | gender, data = alzheimer,
         distribution = approximate(B = 100000,
                                    parallel = "snow", ncpus = 4))

## Registered FORK cluster with 12 processes (not for MS Windows)
fork12 <- parallel::makeCluster(12, "FORK") # set-up cluster
parallel::setDefaultCluster(fork12) # register default cluster
set.seed(123, kind = "L'Ecuyer-CMRG")
cmh_test(disease ~ smoking | gender, data = alzheimer,
         distribution = approximate(B = 100000,
                                    parallel = "snow"))
parallel::stopCluster(fork12) # clean-up

## User-specified PSOCK cluster with 8 processes
psock8 <- parallel::makeCluster(8, "PSOCK") # set-up cluster
set.seed(123, kind = "L'Ecuyer-CMRG")
cmh_test(disease ~ smoking | gender, data = alzheimer,
         distribution = approximate(B = 100000,
                                    parallel = "snow", cl = psock8))
parallel::stopCluster(psock8) # clean-up

Run the code above in your browser using DataLab