summary
method for class "manylm" - computes a table
summarising the statistical significance of different multivariate terms
in a linear model fitted to high-dimensional data, such as multivariate
abundance data in ecology.## S3 method for class 'manylm':
summary(object, nBoot=1000, resamp="residual",
test="F", cor.type=object$cor.type, shrink.param=NULL,
p.uni="none", studentize=TRUE, R2="h", show.cor = FALSE,
show.est=FALSE, show.residuals=FALSE, symbolic.cor=FALSE,
tol=1.0e-6, \dots )
## S3 method for class 'summary.manylm':
print(
x, digits = max(getOption("digits") - 3, 3),
signif.stars=getOption("show.signif.stars"),
dig.tst=max(1, min(5, digits - 1)),
eps.Pvalue=.Machine$double.eps, ...)
manylm
.nBoot=1000
.cor.shrink="I"
) then "LR" is equivalent to LR-IND and "F" iscor.type="shrink"
. If not supplied, but needed, it will be estimated from the data by Cross
Validation using the normal likelihood as in Warton (2008).TRUE
, the correlation matrix of the estimated parameters is
returned and printed.TRUE
, print the correlations in a symbolic form rather than as numbers."summary.manylm"
, usually, a result of a call to summary.manylm
.TRUE
, summary.manyglm
method, these are additional arguments including:
rep.seed
- logical. Whether to fix random seed in resampling data. Useful for simulation or diagnostic purposes.
bootID
- this matrix should be iobject
.test
is not NULL
then the list also
included the componentsdispersion = 1
) estimated covariance matrix of the
estimated coefficients.show.cor
is TRUE
the following adddional
components are returned:summary.manylm
function returns a table summarising the statistical
significance of each multivariate term specified in the fitted manylm model.
For each model term, it returns a test statistic as determined by the argument
test
, and a P-value calculated by resampling rows of the data using a
method determined by the argument resamp
. The four possible resampling methods are residual-permutation (Anderson and Robinson (2001)), score resampling (Wu (1986)), case and residual resampling (Davison and Hinkley (1997, chapter 6)), and involve resampling under the alternative hypothesis. These methods ensure approximately valid inference even when the correlation between variables has been misspecified, and for case and score resampling, even when the equal variance assumption of linear models is invalid. By default, studentized residuals (r_i/sqrt(1-h_ii)) are used in residual and score resampling, although raw residuals could be used via the argument studentize=FALSE
. If resamp="none"
, p-values cannot be calculated, however the test statistics are returned.
If you have a specific hypothesis of primary interest that you want to test,
then you should use the anova.manylm
function, which can resample rows
of the data under the null hypothesis and so usually achieves a better
approximation to the true significance level.
To check model assumptions, use plot.manylm
.
The summary.manylm
function is designed specifically for high-dimensional data (that, is when the number of variables p is not small compared to the number of observations N). In such instances a correlation matrix is computationally intensive to estimate and is numerically unstable, so by default the test statistic is calculated assuming independence of variables (cor.type="I"
). Note however that the resampling scheme used ensures that the P-values are approximately correct even when the independence assumption is not satisfied. However if it is computationally feasible for your dataset, it is recommended that you use cor.type="shrink"
to account for correlation between variables, or cor.type="R"
when p is small. The cor.type="R"
option uses the unstructured correlation matrix (only possible when N>p), such that the standard classical multivariate test statistics are obtained. Note however that such statistics are typically numerically unstable and have low power when p is not small compared to N. The cor.type="shrink"
option applies ridge regularisation (Warton 2008), shrinking the sample correlation matrix towards the identity, which improves its stability when p is not small compared to N. This provides a compromise between "R"
and "I"
, allowing us to account for correlation between variables, while using a numerically stable test statistic that has good properties. The shrinkage parameter by default is estimated by cross-validation using the multivariate normal likelihood function, although it can be specified via shrink.param
as any value between 0 and 1 (0="I" and 1="R", values closer towards 0 indicate more shrinkage towards "I"). The validation groups are chosen by random assignment and so you may observe some slight variation in the estimated shrinkage parameter in repeat analyses. See ridgeParamEst
for more details.
Rather than stopping after testing for multivariate effects, it is often of interest to find out which response variables express significant effects. Univariate statistics are required to answer this question, and these are reported if requested. Setting p.uni="unadjusted"
returns resampling-based univariate P-values for all effects as well as the multivariate P-values, whereas p.uni="adjusted"
returns adjusted P-values (that have been adjusted for multiple testing), calculated using a step-down resampling algorithm as in Westfall & Young (1993, Algorithm 2.8). This method provides strong control of family-wise error rates, and makes use of resampling (using the method controlled by resample
) to ensure inferences take into account correlation between variables.
A multivariate R^2 value is returned in output, but there are many ways to define a multivariate R^2. The type of R^2 used is controlled by the R2
argument. If cor.shrink="I"
then all variables are assumed independent, a special case in which Hooper's R^2 returns the average of all univariate R^2 values, whereas the vector R^2 returns their product.
print.summary.manylm
tries to be smart about formatting the coefficients, genVar
, etc. and additionally gives signif.stars
is TRUE
.manylm
, anova.manylm
, plot.manylm
data(spider)
spiddat <- log(spider$abund+1)
spiddat <- mvabund(spiddat)
spidx <- spider$x
## Estimate the coefficients of a multivariate linear model:
fit <- manylm(spiddat~spidx)
## To summarise this multivariate fit, using score resampling to
## and F Test statistic to estimate significance:
summary(fit, resamp="score", test="F")
## Instead using residual permutation with 2000 iteration, using the sum of F
## statistics to estimate multivariate significance, but also reporting
## univariate statistics with adjusted P-values:
summary(fit, resamp="perm.resid", nBoot=2000, test="F", p.uni="adjusted")
## Obtain a summary of test statistics using residual resampling, accounting
## for correlation between variables but shrinking the correlation matrix to
## improve its stability and showing univariate p-values:
summary(fit, cor.type="shrink", p.uni="adjusted")
Run the code above in your browser using DataLab