# prcomp

##### Principal Components Analysis

Performs a principal components analysis on the given data matrix
and returns the results as an object of class `prcomp`

.

- Keywords
- multivariate

##### Usage

`prcomp(x, …)`# S3 method for formula
prcomp(formula, data = NULL, subset, na.action, …)

# S3 method for default
prcomp(x, retx = TRUE, center = TRUE, scale. = FALSE,
tol = NULL, rank. = NULL, …)

# S3 method for prcomp
predict(object, newdata, …)

##### Arguments

- formula
a formula with no response variable, referring only to numeric variables.

- data
an optional data frame (or similar: see

`model.frame`

) containing the variables in the formula`formula`

. By default the variables are taken from`environment(formula)`

.- subset
an optional vector used to select rows (observations) of the data matrix

`x`

.- na.action
a function which indicates what should happen when the data contain

`NA`

s. The default is set by the`na.action`

setting of`options`

, and is`na.fail`

if that is unset. The ‘factory-fresh’ default is`na.omit`

.- …
arguments passed to or from other methods. If

`x`

is a formula one might specify`scale.`

or`tol`

.- x
a numeric or complex matrix (or data frame) which provides the data for the principal components analysis.

- retx
a logical value indicating whether the rotated variables should be returned.

- center
a logical value indicating whether the variables should be shifted to be zero centered. Alternately, a vector of length equal the number of columns of

`x`

can be supplied. The value is passed to`scale`

.- scale.
a logical value indicating whether the variables should be scaled to have unit variance before the analysis takes place. The default is

`FALSE`

for consistency with S, but in general scaling is advisable. Alternatively, a vector of length equal the number of columns of`x`

can be supplied. The value is passed to`scale`

.- tol
a value indicating the magnitude below which components should be omitted. (Components are omitted if their standard deviations are less than or equal to

`tol`

times the standard deviation of the first component.) With the default null setting, no components are omitted (unless`rank.`

is specified less than`min(dim(x))`

.). Other settings for tol could be`tol = 0`

or`tol = sqrt(.Machine$double.eps)`

, which would omit essentially constant components.- rank.
optionally, a number specifying the maximal rank, i.e., maximal number of principal components to be used. Can be set as alternative or in addition to

`tol`

, useful notably when the desired rank is considerably smaller than the dimensions of the matrix.- object
object of class inheriting from

`"prcomp"`

- newdata
An optional data frame or matrix in which to look for variables with which to predict. If omitted, the scores are used. If the original fit used a formula or a data frame or a matrix with column names,

`newdata`

must contain columns with the same names. Otherwise it must contain the same number of columns, to be used in the same order.

##### Details

The calculation is done by a singular value decomposition of the
(centered and possibly scaled) data matrix, not by using
`eigen`

on the covariance matrix. This
is generally the preferred method for numerical accuracy. The
`print`

method for these objects prints the results in a nice
format and the `plot`

method produces a scree plot.

Unlike `princomp`

, variances are computed with the usual
divisor \(N - 1\).

Note that `scale = TRUE`

cannot be used if there are zero or
constant (for `center = TRUE`

) variables.

##### Value

`prcomp`

returns a list with class `"prcomp"`

containing the following components:

the standard deviations of the principal components (i.e., the square roots of the eigenvalues of the covariance/correlation matrix, though the calculation is actually done with the singular values of the data matrix).

the matrix of variable loadings (i.e., a matrix
whose columns contain the eigenvectors). The function
`princomp`

returns this in the element `loadings`

.

if `retx`

is true the value of the rotated data (the
centred (and scaled if requested) data multiplied by the
`rotation`

matrix) is returned. Hence, `cov(x)`

is the
diagonal matrix `diag(sdev^2)`

. For the formula method,
`napredict()`

is applied to handle the treatment of values
omitted by the `na.action`

.

the centering and scaling used, or `FALSE`

.

##### Note

The signs of the columns of the rotation matrix are arbitrary, and so may differ between different programs for PCA, and even between different builds of R.

##### References

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988)
*The New S Language*.
Wadsworth & Brooks/Cole.

Mardia, K. V., J. T. Kent, and J. M. Bibby (1979)
*Multivariate Analysis*, London: Academic Press.

Venables, W. N. and B. D. Ripley (2002)
*Modern Applied Statistics with S*, Springer-Verlag.

##### See Also

##### Examples

`library(stats)`

```
# NOT RUN {
C <- chol(S <- toeplitz(.9 ^ (0:31))) # Cov.matrix and its root
all.equal(S, crossprod(C))
set.seed(17)
X <- matrix(rnorm(32000), 1000, 32)
Z <- X %*% C ## ==> cov(Z) ~= C'C = S
all.equal(cov(Z), S, tol = 0.08)
pZ <- prcomp(Z, tol = 0.1)
summary(pZ) # only ~14 PCs (out of 32)
## or choose only 3 PCs more directly:
pz3 <- prcomp(Z, rank. = 3)
summary(pz3) # same numbers as the first 3 above
stopifnot(ncol(pZ$rotation) == 14, ncol(pz3$rotation) == 3,
all.equal(pz3$sdev, pZ$sdev, tol = 1e-15)) # exactly equal typically
# }
# NOT RUN {
## signs are random
require(graphics)
## the variances of the variables in the
## USArrests data vary by orders of magnitude, so scaling is appropriate
prcomp(USArrests) # inappropriate
prcomp(USArrests, scale = TRUE)
prcomp(~ Murder + Assault + Rape, data = USArrests, scale = TRUE)
plot(prcomp(USArrests))
summary(prcomp(USArrests, scale = TRUE))
biplot(prcomp(USArrests, scale = TRUE))
# }
```

*Documentation reproduced from package stats, version 3.5.3, License: Part of R 3.5.3*