candisc
performs a generalized canonical discriminant analysis for
one term in a multivariate linear model (i.e., an mlm
object),
computing canonical scores and vectors. It represents a transformation of
the original variables into a canonical space of maximal differences for the
term, controlling for other model terms.
candisc(mod, ...)# S3 method for mlm
candisc(mod, term, type = "2", manova, ndim = rank, ...)
# S3 method for candisc
print(x, digits = max(getOption("digits") - 2, 3), LRtests = TRUE, ...)
# S3 method for candisc
summary(
object,
means = TRUE,
scores = FALSE,
coef = c("std"),
ndim,
digits = max(getOption("digits") - 2, 4),
...
)
# S3 method for candisc
coef(object, type = c("std", "raw", "structure"), ...)
# S3 method for candisc
plot(
x,
which = 1:2,
conf = 0.95,
col,
pch,
scale,
asp = 1,
var.col = "blue",
var.lwd = par("lwd"),
var.labels,
var.cex = 1,
var.pos,
rev.axes = c(FALSE, FALSE),
ellipse = FALSE,
ellipse.prob = 0.68,
fill.alpha = 0.1,
prefix = "Can",
suffix = TRUE,
titles.1d = c("Canonical scores", "Structure"),
points.1d = FALSE,
...
)
An object of class candisc
with the following components:
hypothesis degrees of freedom for term
error degrees of freedom for the mlm
number of non-zero eigenvalues of \(HE^{-1}\)
eigenvalues of \(HE^{-1}\)
squared canonical correlations
A vector containing the percentages of the canrsq
of their total.
Number of canonical dimensions stored in the means
,
structure
and coeffs.*
components
A data.frame containing the class means for the levels of the factor(s) in the term
A data frame containing the levels of the factor(s) in the term
name of the term
A character vector containing the names of the terms in the
mlm
object
A matrix containing the raw canonical coefficients
A matrix containing the standardized canonical coefficients
A matrix containing the canonical structure
coefficients on ndim
dimensions, i.e., the correlations between the
original variates and the canonical scores. These are sometimes referred to
as Total Structure Coefficients.
A data frame containing the
predictors in the mlm
model and the canonical scores on ndim
dimensions. These are calculated as Y %*% coeffs.raw
, where Y
contains the standardized response variables.
An mlm object, such as computed by lm()
with a
multivariate response
arguments to be passed down. In particular, type="n"
can be used with the plot
method to suppress the display of canonical
scores.
the name of one term from mod
for which the canonical
analysis is performed.
type of test for the model term
, one of: "II", "III", "2", or "3"
the Anova.mlm
object corresponding to mod
.
Normally, this is computed internally by Anova(mod)
Number of dimensions to store in (or retrieve from, for the
summary
method) the means
, structure
, scores
and
coeffs.*
components.
The default is the rank of the H matrix for the hypothesis term.
significant digits to print.
logical; should likelihood ratio tests for the canonical dimensions be printed?
A candisc object
Logical value used to determine if canonical means are printed
Logical value used to determine if canonical scores are printed
Type of coefficients printed by the summary method. Any one or
more of "std"
, "raw"
, or "structure"
A vector of one or two integers, selecting the canonical
dimension(s) to plot. If the canonical structure for a term
has
ndim==1
, or length(which)==1
, a 1D representation of canonical
scores and structure coefficients is produced by the plot
method.
Otherwise, a 2D plot is produced.
Confidence coefficient for the confidence circles around
canonical means plotted in the plot
method
A vector of the unique colors to be used for the levels of the
term in the plot
method, one for each level of the term
. In
this version, you should assign colors and point symbols explicitly, rather
than relying on the somewhat arbitrary defaults, based on
palette
A vector of the unique point symbols to be used for the levels of
the term in the plot
method
Scale factor for the variable vectors in canonical space. If not specified, a scale factor is calculated to make the variable vectors approximately fill the plot space.
Aspect ratio for the plot
method. The asp=1
(the
default) assures that the units on the horizontal and vertical axes are the
same, so that lengths and angles of the variable vectors are interpretable.
Color used to plot variable vectors
Line width used to plot variable vectors
Optional vector of variable labels to replace variable names in the plots
Character expansion size for variable labels in the plots
Position(s) of variable vector labels wrt. the end point. If not specified, the labels are out-justified left and right with respect to the end points.
Logical, a vector of length(which)
. TRUE
causes the orientation of the canonical scores and structure coefficients to
be reversed along a given axis.
Draw data ellipses for canonical scores?
Coverage probability for the data ellipses
Transparency value for the color used to fill the
ellipses. Use fill.alpha
to draw the ellipses unfilled.
Prefix used to label the canonical dimensions plotted
Suffix for labels of canonical dimensions. If
suffix=TRUE
the percent of hypothesis (H) variance accounted for by
each canonical dimension is added to the axis label.
A character vector of length 2, containing titles for the panels used to plot the canonical scores and structure vectors, for the case in which there is only one canonical dimension.
Logical value for plot.candisc
when only one
canonical dimension.
candisc(mlm)
: "mlm"
method.
Michael Friendly and John Fox
In typical usage, the term
should be a factor or interaction
corresponding to a multivariate test with 2 or more degrees of freedom for
the null hypothesis.
Canonical discriminant analysis is typically carried out in conjunction with
a one-way MANOVA design. It represents a linear transformation of the
response variables into a canonical space in which (a) each successive
canonical variate produces maximal separation among the groups (e.g.,
maximum univariate F statistics), and (b) all canonical variates are
mutually uncorrelated. For a one-way MANOVA with g groups and p responses,
there are dfh
= min( g-1, p) such canonical dimensions, and tests,
initially stated by Bartlett (1938) allow one to determine the number of
significant canonical dimensions.
Computational details for the one-way case are described in Cooley & Lohnes (1971), and in the SAS/STAT User's Guide, "The CANDISC procedure: Computational Details," http://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#statug_candisc_sect012.htm.
A generalized canonical discriminant analysis extends this idea to a general
multivariate linear model. Analysis of each term in the mlm
produces
a rank \(df_h\) H matrix sum of squares and crossproducts matrix that
is tested against the rank \(df_e\) E matrix by the standard
multivariate tests (Wilks' Lambda, Hotelling-Lawley trace, Pillai trace,
Roy's maximum root test). For any given term in the mlm
, the
generalized canonical discriminant analysis amounts to a standard
discriminant analysis based on the H matrix for that term in relation to the
full-model E matrix.
The plot method for candisc objects is typically a 2D plot, similar to a
biplot. It shows the canonical scores for the groups defined by the
term
as points and the canonical structure coefficients as vectors
from the origin.
If the canonical structure for a term
has ndim==1
, or
length(which)==1
, the 1D representation consists of a boxplot of
canonical scores and a vector diagram showing the magnitudes of the
structure coefficients.
Bartlett, M. S. (1938). Further aspects of the theory of multiple regression. Proc. Cambridge Philosophical Society 34, 33-34.
Cooley, W.W. & Lohnes, P.R. (1971). Multivariate Data Analysis, New York: Wiley.
Gittins, R. (1985). Canonical Analysis: A Review with Applications in Ecology, Berlin: Springer.
candiscList
, heplot
,
heplot3d
grass.mod <- lm(cbind(N1,N9,N27,N81,N243) ~ Block + Species, data=Grass)
car::Anova(grass.mod, test="Wilks")
grass.can1 <-candisc(grass.mod, term="Species")
plot(grass.can1)
# library(heplots)
heplot(grass.can1, scale=6, fill=TRUE)
# iris data
iris.mod <- lm(cbind(Petal.Length, Sepal.Length, Petal.Width, Sepal.Width) ~ Species, data=iris)
iris.can <- candisc(iris.mod, data=iris)
#-- assign colors and symbols corresponding to species
col <- c("red", "brown", "green3")
pch <- 1:3
plot(iris.can, col=col, pch=pch)
heplot(iris.can)
# 1-dim plot
iris.can1 <- candisc(iris.mod, data=iris, ndim=1)
plot(iris.can1)
Run the code above in your browser using DataLab