boxM: Box's M-test for Homogeneity of Covariance Matrices

Description

boxM() performs the Box's (1949) M-test for homogeneity of covariance matrices obtained from multivariate normal data according to one or more classification factors. The test compares the product of the log determinants of the separate covariance matrices to the log determinant of the pooled covariance matrix, analogous to a likelihood ratio test. The test statistic uses a chi-square approximation.

Usage

boxM(Y, ...)
# S3 method for formula
boxM(Y, data, ...)
# S3 method for lm
boxM(Y, ...)
# S3 method for default
boxM(Y, group, ...)
# S3 method for boxM
print(x, ...)
# S3 method for boxM
summary(
  object,
  digits = getOption("digits") - 2,
  cov = FALSE,
  quiet = FALSE,
  ...
)

Value

A list with class c("boxM", "htest") containing the following components:

statistic: the chi-square (approximate) statistic for Box's M test, where large values imply the covariance matrices differ.
parameter: the degrees of freedom for the test statistic.
p.value: the p-value of the test
ngroups: the number of levels of the group variable
cov: a list of the group covariance matrices, of length ngroups
pooled: the pooled covariance matrix
means: a matrix whose ngroups+1 rows are the means of the variables, followed by those for pooled data.
logDet: a vector of length ngroups+1 containing the natural logarithm of each matrix in cov, followed by that for the pooled covariance matrix
df: a vector of the degrees of freedom for all groups, followed by that for the pooled covariance matrix
data.name: a character string giving the names of the data, as extracted from the call
method: the character string "Box's M-test for Homogeneity of Covariance Matrices"

Arguments

Y: The response variable matrix for the default method, or a "mlm" or "formula" object for a multivariate linear model. If Y is a linear-model object or a formula, the variables on the right-hand-side of the model must all be factors and must be completely crossed, e.g., A:B
...: Other arguments passed down
data: A data frame containing the variables in the model. Used only for the formula method.
group: A vector specifying the groups. Used only for the default method.
x: a class "boxM" object, for the print() method
object: A "boxM" object, result of a call to boxM
digits: Number of digits in printed output
cov: Logical; if TRUE, the covariance matrices for each group and the pooled covariance matrix are printed
quiet: Logical; if TRUE, suppress printed output

Author

The default method was taken from the biotools package, Anderson Rodrigo da Silva anderson.agro@hotmail.com

Generalized by Michael Friendly and John Fox

Details

As an object of class "boxM", a few methods are available: print.boxM(), summary.boxM() and plot.boxM().

There is no general provision as yet for handling missing data. Missing data are simply removed, with a warning.

As well, the computation assumes that the covariance matrix for each group is non-singular, so that \(\log det(S_i)\) can be calculated for each group. At the minimum, this requires that \(n > p\) for each group.

Box's M test for a multivariate linear model highly sensitive to departures from multivariate normality, just as the analogous univariate test. It is also affected adversely by unbalanced designs. Some people recommend to ignore the result unless it is very highly significant, e.g., p < .0001 or worse.

In general, heterogeneity of covariance matrices can be more easily seen and understood by plotting the covariance ellipses using covEllipses.

The summary method prints a variety of additional statistics based on the eigenvalues of the covariance matrices. These are returned invisibly, as a list containing the following components:

logDet

the vector of log determinants

eigs

eigenvalues of the covariance matrices

eigstats

statistics computed on the eigenvalues for each covariance matrix:

product: the product of eigenvalues, \(\prod{\lambda_i}\)

sum

the sum of eigenvalues, \(\sum{\lambda_i}\)

precision

the average precision of eigenvalues, \(1/\sum(1/\lambda_i)\)

max

the maximum eigenvalue, \(\lambda_1\)

References

Box, G. E. P. (1949). A general distribution theory for a class of likelihood criteria. Biometrika, 36, 317-346.

Morrison, D.F. (1976) Multivariate Statistical Methods.

Examples

Run this code


data(iris)

# default method, using `Y`, `group` 
res <- boxM(iris[, 1:4], iris[, "Species"])
res

# summary method gives details
summary(res)

# visualize (this is what is done in the plot method)
dets <- res$logDet
ng <- length(res$logDet)-1
dotchart(dets, xlab = "log determinant")
points(dets , 1:4, cex=c(rep(1.5, ng), 2.5), pch=c(rep(16, ng), 15),
       col= c(rep("blue", ng), "red"))

# plot method gives confidence intervals for logDet
plot(res, gplabel="Species")

# formula method
boxM( cbind(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width) ~ Species,
      data=iris)

### Skulls data
data(Skulls)

# lm method
skulls.mod <- lm(cbind(mb, bh, bl, nh) ~ epoch, data=Skulls)
skulls.boxM <- boxM(skulls.mod) |>
  print()
summary(skulls.boxM)

Run the code above in your browser using DataLab