mdmr (multivariate distance matrix regression) is used to regress a
distance matrix onto a set of predictors. It returns the test statistic,
pseudo R-square statistic, and analytic p-values for all predictors
jointly and for each predictor individually, conditioned on the rest.
mdmr(
X,
D = NULL,
G = NULL,
lambda = NULL,
return.lambda = F,
start.acc = 1e-20,
ncores = 1,
perm.p = (nrow(as.matrix(X)) < 200),
nperm = 500,
seed = NULL
)An object with six elements and a summary function. Calling
summary(mdmr.res) produces a data frame comprised of:
Value of the corresponding MDMR test statistic
Numerator degrees of freedom for the corresponding effect
Size of the corresponding effect on the distance matrix
The p-value for each effect.
In addition to the information in the three columns comprising
summary(res), the res object also contains:
A data.frame reporting the precision of each p-value. If
analytic p-values were computed, these are the maximum error bound of the
p-values reported by the davies function in CompQuadForm. If
permutation p-values were computed, it is the standard error of each
permutation p-value.
A vector of the eigenvalues of G (if
return.lambda = T).
Number of permutations used. Will read NA if analytic
p-values were computed
Note that the printed output of summary(res) will truncate p-values
to the smallest trustworthy values, but the object returned by
summary(res) will contain the p-values as computed. The reason for
this truncation differs for analytic and permutation p-values. For an
analytic p-value, if the error bound of the Davies algorithm is larger than
the p-value, the only conclusion that can be drawn with certainty is that
the p-value is smaller than (or equal to) the error bound. For a permutation
test, the estimated p-value will be zero if no permuted test statistics are
greater than the observed statistic, but the zero p-value is only a product
of the finite number of permutations conduted. The only conclusion that can
be drawn is that the p-value is smaller than 1/nperm.
A \(n x p\) matrix or data frame of predictors. Unordered factors
will be tested with contrast-codes by default, and ordered factors will be
tested with polynomial contrasts. For finer control of how categorical
predictors are handled, or if higher-order effects are desired, the output
from a call to model.matrix() can be supplied to this argument as
well.
Distance matrix computed on the outcome data. Can be either a
matrix or an R dist object. Either D or G
must be passed to mdmr().
Gower's centered similarity matrix computed from D.
Either D or G must be passed to mdmr.
Optional argument: Eigenvalues of G.
Eigendecomposition of large G matrices can be somewhat time
consuming, and the theoretical p-values require the eigenvalues of
G. If MDMR is to be conducted multiple times on one distance
matrix, it is advised to conduct the eigendecomposition once and pass the
eigenvalues to mdmr() directly each time.
Logical; indicates whether or not the eigenvalues of
G should be returned, if calculated. Default is FALSE.
Starting accuracy of the Davies (1980) algorithm
implemented in the davies function in the CompQuadForm
package (Duchesne & De Micheaux, 2010) that mdmr() uses to compute
MDMR p-values.
Integer; if ncores > 1, the parallel
package is used to speed computation. Note: Windows users must set
ncores = 1 because the parallel pacakge relies on forking. See
mc.cores in the mclapply function in the
parallel pacakge for more details.
Logical: should permutation-based p-values be computed instead
of analytic p-values? Default behavior is TRUE if n < 200 and
FALSE otherwise because the anlytic p-values depend on asymptotics.
for n > 200 and "permutation" otherwise.
Number of permutations to use if permutation-based p-values are to be computed.
Random seed to use to generate the permutation null distribution. Defaults to a random seed.
Daniel B. McArtor (dmcartor@gmail.com) [aut, cre]
This function is the fastest approach to conducting MDMR. It uses the fastest known computational strategy to compute the MDMR test statistic (see Appendix A of McArtor et al., 2017), and it uses fast, analytic p-values.
The slowest part of conducting MDMR is now the necessary eigendecomposition
of the G matrix, whose computation time is a function of
\(n^3\). If MDMR is to be conducted multiple times on the same
distance matrix, it is recommended to compute eigenvalues of G in
advance and pass them to the function rather than computing them every
time mdmr is called, as is the case if the argument lambda
is left NULL.
The distance matrix D can be passed to mdmr as either a
distance object or a symmetric matrix.
Davies, R. B. (1980). The Distribution of a Linear Combination of chi-square Random Variables. Journal of the Royal Statistical Society. Series C (Applied Statistics), 29(3), 323-333.
Duchesne, P., & De Micheaux, P. L. (2010). Computing the distribution of quadratic forms: Further comparisons between the Liu-Tang-Zhang approximation and exact methods. Computational Statistics and Data Analysis, 54(4), 858-862.
McArtor, D. B., Lubke, G. H., & Bergeman, C. S. (2017). Extending multivariate distance matrix regression with an effect size measure and the distribution of the test statistic. Psychometrika, 82, 1052-1077.
# --- The following two approaches yield equivalent results --- #
# Approach 1
data(mdmrdata)
D <- dist(Y.mdmr, method = "euclidean")
res1 <- mdmr(X = X.mdmr, D = D)
summary(res1)
# Approach 2
data(mdmrdata)
D <- dist(Y.mdmr, method = "euclidean")
G <- gower(D)
res2 <- mdmr(X = X.mdmr, G = G)
summary(res2)
Run the code above in your browser using DataLab