Principal Components Analysis
Performs a principal components analysis on the given data matrix
and returns the results as an object of class
## S3 method for class 'formula': prcomp(formula, data = NULL, subset, na.action, \dots)
## S3 method for class 'default': prcomp(x, retx = TRUE, center = TRUE, scale. = FALSE, tol = NULL, \dots)
## S3 method for class 'prcomp': predict(object, newdata, \dots)
- a formula with no response variable, referring only to numeric variables.
- an optional data frame (or similar: see
model.frame) containing the variables in the formula
formula. By default the variables are taken from
- an optional vector used to select rows (observations) of the
- a function which indicates what should happen
when the data contain
NAs. The default is set by the
options, and is
na.failif that is unset. The
- arguments passed to or from other methods. If
xis a formula one might specify
- a numeric or complex matrix (or data frame) which provides the data for the principal components analysis.
- a logical value indicating whether the rotated variables should be returned.
- a logical value indicating whether the variables
should be shifted to be zero centered. Alternately, a vector of
length equal the number of columns of
xcan be supplied. The value is passed to
- a logical value indicating whether the variables should
be scaled to have unit variance before the analysis takes
place. The default is
FALSEfor consistency with S, but in general scaling is advisable. Alternatively, a vector of length equal the number of columns of
xcan be supplied. The value is passed to
- a value indicating the magnitude below which components
should be omitted. (Components are omitted if their
standard deviations are less than or equal to
toltimes the standard deviation of the first component.) With the default null setting, no components are omitted. Other settings for tol could be
tol = 0or
tol = sqrt(.Machine$double.eps), which would omit essentially constant components.
- Object of class inheriting from
- An optional data frame or matrix in which to look for
variables with which to predict. If omitted, the scores are used.
If the original fit used a formula or a data frame or a matrix with
newdatamust contain columns with the same names. Otherwise it must contain the same number of columns, to be used in the same order.
The calculation is done by a singular value decomposition of the
(centered and possibly scaled) data matrix, not by using
eigen on the covariance matrix. This
is generally the preferred method for numerical accuracy. The
plot method produces a scree plot.
princomp, variances are computed with the usual
divisor $N - 1$.
scale = TRUE cannot be used if there are zero or
center = TRUE) variables.
prcompreturns a list with class
"prcomp"containing the following components:
sdev the standard deviations of the principal components (i.e., the square roots of the eigenvalues of the covariance/correlation matrix, though the calculation is actually done with the singular values of the data matrix). rotation the matrix of variable loadings (i.e., a matrix whose columns contain the eigenvectors). The function
princompreturns this in the element
retxis true the value of the rotated data (the centred (and scaled if requested) data multiplied by the
rotationmatrix) is returned. Hence,
cov(x)is the diagonal matrix
diag(sdev^2). For the formula method,
napredict()is applied to handle the treatment of values omitted by the
center, scale the centering and scaling used, or
The signs of the columns of the rotation matrix are arbitrary, and so may differ between different programs for PCA, and even between different builds of R.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
Mardia, K. V., J. T. Kent, and J. M. Bibby (1979) Multivariate Analysis, London: Academic Press.
Venables, W. N. and B. D. Ripley (2002) Modern Applied Statistics with S, Springer-Verlag.
## signs are random require(graphics) ## the variances of the variables in the ## USArrests data vary by orders of magnitude, so scaling is appropriate prcomp(USArrests) # inappropriate prcomp(USArrests, scale = TRUE) prcomp(~ Murder + Assault + Rape, data = USArrests, scale = TRUE) plot(prcomp(USArrests)) summary(prcomp(USArrests, scale = TRUE)) biplot(prcomp(USArrests, scale = TRUE))