mdmr: Conduct MDMR with analytic p-values

Description

mdmr (multivariate distance matrix regression) is used to regress a distance matrix onto a set of predictors. It returns the test statistic, pseudo R-square statistic, and analytic p-values for all predictors jointly and for each predictor individually, conditioned on the rest.

Usage

mdmr(
  X,
  D = NULL,
  G = NULL,
  lambda = NULL,
  return.lambda = F,
  start.acc = 1e-20,
  ncores = 1,
  perm.p = (nrow(as.matrix(X)) < 200),
  nperm = 500,
  seed = NULL
)

Value

An object with six elements and a summary function. Calling summary(mdmr.res) produces a data frame comprised of:

Statistic: Value of the corresponding MDMR test statistic
Numer DF: Numerator degrees of freedom for the corresponding effect
Pseudo R2: Size of the corresponding effect on the distance matrix
p-value: The p-value for each effect.

In addition to the information in the three columns comprising summary(res), the res object also contains:

p.prec: A data.frame reporting the precision of each p-value. If analytic p-values were computed, these are the maximum error bound of the p-values reported by the davies function in CompQuadForm. If permutation p-values were computed, it is the standard error of each permutation p-value.
lambda: A vector of the eigenvalues of G (if return.lambda = T).
nperm: Number of permutations used. Will read NA if analytic p-values were computed

Note that the printed output of summary(res) will truncate p-values to the smallest trustworthy values, but the object returned by summary(res) will contain the p-values as computed. The reason for this truncation differs for analytic and permutation p-values. For an analytic p-value, if the error bound of the Davies algorithm is larger than the p-value, the only conclusion that can be drawn with certainty is that the p-value is smaller than (or equal to) the error bound. For a permutation test, the estimated p-value will be zero if no permuted test statistics are greater than the observed statistic, but the zero p-value is only a product of the finite number of permutations conduted. The only conclusion that can be drawn is that the p-value is smaller than 1/nperm.

Arguments

X: A \(n x p\) matrix or data frame of predictors. Unordered factors will be tested with contrast-codes by default, and ordered factors will be tested with polynomial contrasts. For finer control of how categorical predictors are handled, or if higher-order effects are desired, the output from a call to model.matrix() can be supplied to this argument as well.
D: Distance matrix computed on the outcome data. Can be either a matrix or an R dist object. Either D or G must be passed to mdmr().
G: Gower's centered similarity matrix computed from D. Either D or G must be passed to mdmr.
lambda: Optional argument: Eigenvalues of G. Eigendecomposition of large G matrices can be somewhat time consuming, and the theoretical p-values require the eigenvalues of G. If MDMR is to be conducted multiple times on one distance matrix, it is advised to conduct the eigendecomposition once and pass the eigenvalues to mdmr() directly each time.
return.lambda: Logical; indicates whether or not the eigenvalues of G should be returned, if calculated. Default is FALSE.
start.acc: Starting accuracy of the Davies (1980) algorithm implemented in the davies function in the CompQuadForm package (Duchesne & De Micheaux, 2010) that mdmr() uses to compute MDMR p-values.
ncores: Integer; if ncores > 1, the parallel package is used to speed computation. Note: Windows users must set ncores = 1 because the parallel pacakge relies on forking. See mc.cores in the mclapply function in the parallel pacakge for more details.
perm.p: Logical: should permutation-based p-values be computed instead of analytic p-values? Default behavior is TRUE if n < 200 and FALSE otherwise because the anlytic p-values depend on asymptotics. for n > 200 and "permutation" otherwise.
nperm: Number of permutations to use if permutation-based p-values are to be computed.
seed: Random seed to use to generate the permutation null distribution. Defaults to a random seed.

Author

Daniel B. McArtor (dmcartor@gmail.com) [aut, cre]

Details

This function is the fastest approach to conducting MDMR. It uses the fastest known computational strategy to compute the MDMR test statistic (see Appendix A of McArtor et al., 2017), and it uses fast, analytic p-values.

The slowest part of conducting MDMR is now the necessary eigendecomposition of the G matrix, whose computation time is a function of \(n^3\). If MDMR is to be conducted multiple times on the same distance matrix, it is recommended to compute eigenvalues of G in advance and pass them to the function rather than computing them every time mdmr is called, as is the case if the argument lambda is left NULL.

The distance matrix D can be passed to mdmr as either a distance object or a symmetric matrix.

References

Davies, R. B. (1980). The Distribution of a Linear Combination of chi-square Random Variables. Journal of the Royal Statistical Society. Series C (Applied Statistics), 29(3), 323-333.

Duchesne, P., & De Micheaux, P. L. (2010). Computing the distribution of quadratic forms: Further comparisons between the Liu-Tang-Zhang approximation and exact methods. Computational Statistics and Data Analysis, 54(4), 858-862.

McArtor, D. B., Lubke, G. H., & Bergeman, C. S. (2017). Extending multivariate distance matrix regression with an effect size measure and the distribution of the test statistic. Psychometrika, 82, 1052-1077.

Examples

Run this code

# --- The following two approaches yield equivalent results --- #
# Approach 1
data(mdmrdata)
D <- dist(Y.mdmr, method = "euclidean")
res1 <- mdmr(X = X.mdmr, D = D)
summary(res1)

# Approach 2
data(mdmrdata)
D <- dist(Y.mdmr, method = "euclidean")
G <- gower(D)
res2 <- mdmr(X = X.mdmr, G = G)
summary(res2)

Run the code above in your browser using DataLab