princomp
Principal Components Analysis
princomp
performs a principal components analysis on the given
numeric data matrix and returns the results as an object of class
princomp
.
- Keywords
- multivariate
Usage
princomp(x, …)# S3 method for formula
princomp(formula, data = NULL, subset, na.action, …)
# S3 method for default
princomp(x, cor = FALSE, scores = TRUE, covmat = NULL,
subset = rep_len(TRUE, nrow(as.matrix(x))), fix_sign = TRUE, …)
# S3 method for princomp
predict(object, newdata, …)
Arguments
- formula
a formula with no response variable, referring only to numeric variables.
- data
an optional data frame (or similar: see
model.frame
) containing the variables in the formulaformula
. By default the variables are taken fromenvironment(formula)
.- subset
an optional vector used to select rows (observations) of the data matrix
x
.- na.action
a function which indicates what should happen when the data contain
NA
s. The default is set by thena.action
setting ofoptions
, and isna.fail
if that is unset. The ‘factory-fresh’ default isna.omit
.- x
a numeric matrix or data frame which provides the data for the principal components analysis.
- cor
a logical value indicating whether the calculation should use the correlation matrix or the covariance matrix. (The correlation matrix can only be used if there are no constant variables.)
- scores
a logical value indicating whether the score on each principal component should be calculated.
- covmat
a covariance matrix, or a covariance list as returned by
cov.wt
(andcov.mve
orcov.mcd
from package MASS). If supplied, this is used rather than the covariance matrix ofx
.- fix_sign
Should the signs of the loadings and scores be chosen so that the first element of each loading is non-negative?
- …
arguments passed to or from other methods. If
x
is a formula one might specifycor
orscores
.- object
Object of class inheriting from
"princomp"
.- newdata
An optional data frame or matrix in which to look for variables with which to predict. If omitted, the scores are used. If the original fit used a formula or a data frame or a matrix with column names,
newdata
must contain columns with the same names. Otherwise it must contain the same number of columns, to be used in the same order.
Details
princomp
is a generic function with "formula"
and
"default"
methods.
The calculation is done using eigen
on the correlation or
covariance matrix, as determined by cor
. This is done for
compatibility with the S-PLUS result. A preferred method of
calculation is to use svd
on x
, as is done in
prcomp
.
Note that the default calculation uses divisor N
for the
covariance matrix.
The print
method for these objects prints the
results in a nice format and the plot
method produces
a scree plot (screeplot
). There is also a
biplot
method.
If x
is a formula then the standard NA-handling is applied to
the scores (if requested): see napredict
.
princomp
only handles so-called R-mode PCA, that is feature
extraction of variables. If a data matrix is supplied (possibly via a
formula) it is required that there are at least as many units as
variables. For Q-mode PCA use prcomp
.
Value
princomp
returns a list with class "princomp"
containing the following components:
the standard deviations of the principal components.
the matrix of variable loadings (i.e., a matrix
whose columns contain the eigenvectors). This is of class
"loadings"
: see loadings
for its print
method.
the means that were subtracted.
the scalings applied to each variable.
the number of observations.
if scores = TRUE
, the scores of the supplied
data on the principal components. These are non-null only if
x
was supplied, and if covmat
was also supplied if it
was a covariance list. For the formula method,
napredict()
is applied to handle the treatment of
values omitted by the na.action
.
the matched call.
If relevant.
Note
The signs of the columns of the loadings and scores are arbitrary, and
so may differ between different programs for PCA, and even between
different builds of R: fix_sign = TRUE
alleviates that.
References
Mardia, K. V., J. T. Kent and J. M. Bibby (1979). Multivariate Analysis, London: Academic Press.
Venables, W. N. and B. D. Ripley (2002). Modern Applied Statistics with S, Springer-Verlag.
See Also
summary.princomp
, screeplot
,
biplot.princomp
,
prcomp
, cor
, cov
,
eigen
.
Examples
library(stats)
# NOT RUN {
require(graphics)
## The variances of the variables in the
## USArrests data vary by orders of magnitude, so scaling is appropriate
(pc.cr <- princomp(USArrests)) # inappropriate
princomp(USArrests, cor = TRUE) # =^= prcomp(USArrests, scale=TRUE)
## Similar, but different:
## The standard deviations differ by a factor of sqrt(49/50)
summary(pc.cr <- princomp(USArrests, cor = TRUE))
loadings(pc.cr) # note that blank entries are small but not zero
## The signs of the columns of the loadings are arbitrary
plot(pc.cr) # shows a screeplot.
biplot(pc.cr)
## Formula interface
princomp(~ ., data = USArrests, cor = TRUE)
## NA-handling
USArrests[1, 2] <- NA
pc.cr <- princomp(~ Murder + Assault + UrbanPop,
data = USArrests, na.action = na.exclude, cor = TRUE)
# }
# NOT RUN {
pc.cr$scores[1:5, ]
# }
# NOT RUN {
## (Simple) Robust PCA:
## Classical:
(pc.cl <- princomp(stackloss))
# }
# NOT RUN {
## Robust:
(pc.rob <- princomp(stackloss, covmat = MASS::cov.rob(stackloss)))
# }