Performs a principal components analysis on the given data matrix
and returns the results as an object of class `prcomp`

.

`prcomp(x, …)`# S3 method for formula
prcomp(formula, data = NULL, subset, na.action, …)

# S3 method for default
prcomp(x, retx = TRUE, center = TRUE, scale. = FALSE,
tol = NULL, rank. = NULL, …)

# S3 method for prcomp
predict(object, newdata, …)

formula

a formula with no response variable, referring only to numeric variables.

data

an optional data frame (or similar: see
`model.frame`

) containing the variables in the
formula `formula`

. By default the variables are taken from
`environment(formula)`

.

subset

an optional vector used to select rows (observations) of the
data matrix `x`

.

na.action

…

arguments passed to or from other methods. If `x`

is
a formula one might specify `scale.`

or `tol`

.

x

a numeric or complex matrix (or data frame) which provides the data for the principal components analysis.

retx

a logical value indicating whether the rotated variables should be returned.

center

a logical value indicating whether the variables
should be shifted to be zero centered. Alternately, a vector of
length equal the number of columns of `x`

can be supplied.
The value is passed to `scale`

.

scale.

a logical value indicating whether the variables should
be scaled to have unit variance before the analysis takes
place. The default is `FALSE`

for consistency with S, but
in general scaling is advisable. Alternatively, a vector of length
equal the number of columns of `x`

can be supplied. The
value is passed to `scale`

.

tol

a value indicating the magnitude below which components
should be omitted. (Components are omitted if their
standard deviations are less than or equal to `tol`

times the
standard deviation of the first component.) With the default null
setting, no components are omitted (unless `rank.`

is specified
less than `min(dim(x))`

.). Other settings for tol could be
`tol = 0`

or `tol = sqrt(.Machine$double.eps)`

, which
would omit essentially constant components.

rank.

optionally, a number specifying the maximal rank, i.e.,
maximal number of principal components to be used. Can be set as
alternative or in addition to `tol`

, useful notably when the
desired rank is considerably smaller than the dimensions of the matrix.

object

object of class inheriting from `"prcomp"`

newdata

An optional data frame or matrix in which to look for
variables with which to predict. If omitted, the scores are used.
If the original fit used a formula or a data frame or a matrix with
column names, `newdata`

must contain columns with the same
names. Otherwise it must contain the same number of columns, to be
used in the same order.

`prcomp`

returns a list with class `"prcomp"`

containing the following components:

the standard deviations of the principal components (i.e., the square roots of the eigenvalues of the covariance/correlation matrix, though the calculation is actually done with the singular values of the data matrix).

the matrix of variable loadings (i.e., a matrix
whose columns contain the eigenvectors). The function
`princomp`

returns this in the element `loadings`

.

if `retx`

is true the value of the rotated data (the
centred (and scaled if requested) data multiplied by the
`rotation`

matrix) is returned. Hence, `cov(x)`

is the
diagonal matrix `diag(sdev^2)`

. For the formula method,
`napredict()`

is applied to handle the treatment of values
omitted by the `na.action`

.

the centering and scaling used, or `FALSE`

.

The calculation is done by a singular value decomposition of the
(centered and possibly scaled) data matrix, not by using
`eigen`

on the covariance matrix. This
is generally the preferred method for numerical accuracy. The
`print`

method for these objects prints the results in a nice
format and the `plot`

method produces a scree plot.

Unlike `princomp`

, variances are computed with the usual
divisor \(N - 1\).

Note that `scale = TRUE`

cannot be used if there are zero or
constant (for `center = TRUE`

) variables.

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988)
*The New S Language*.
Wadsworth & Brooks/Cole.

Mardia, K. V., J. T. Kent, and J. M. Bibby (1979)
*Multivariate Analysis*, London: Academic Press.

Venables, W. N. and B. D. Ripley (2002)
*Modern Applied Statistics with S*, Springer-Verlag.

# NOT RUN { C <- chol(S <- toeplitz(.9 ^ (0:31))) # Cov.matrix and its root all.equal(S, crossprod(C)) set.seed(17) X <- matrix(rnorm(32000), 1000, 32) Z <- X %*% C ## ==> cov(Z) ~= C'C = S all.equal(cov(Z), S, tol = 0.08) pZ <- prcomp(Z, tol = 0.1) summary(pZ) # only ~14 PCs (out of 32) ## or choose only 3 PCs more directly: pz3 <- prcomp(Z, rank. = 3) summary(pz3) # same numbers as the first 3 above stopifnot(ncol(pZ$rotation) == 14, ncol(pz3$rotation) == 3, all.equal(pz3$sdev, pZ$sdev, tol = 1e-15)) # exactly equal typically # } # NOT RUN { ## signs are random require(graphics) ## the variances of the variables in the ## USArrests data vary by orders of magnitude, so scaling is appropriate prcomp(USArrests) # inappropriate prcomp(USArrests, scale = TRUE) prcomp(~ Murder + Assault + Rape, data = USArrests, scale = TRUE) plot(prcomp(USArrests)) summary(prcomp(USArrests, scale = TRUE)) biplot(prcomp(USArrests, scale = TRUE)) # }