# princomp

##### Principal Components Analysis

`princomp`

performs a principal components analysis on the given
numeric data matrix and returns the results as an object of class
`princomp`

.

- Keywords
- multivariate

##### Usage

`princomp(x, …)`# S3 method for formula
princomp(formula, data = NULL, subset, na.action, …)

# S3 method for default
princomp(x, cor = FALSE, scores = TRUE, covmat = NULL,
subset = rep_len(TRUE, nrow(as.matrix(x))), fix_sign = TRUE, …)

# S3 method for princomp
predict(object, newdata, …)

##### Arguments

- formula
a formula with no response variable, referring only to numeric variables.

- data
an optional data frame (or similar: see

`model.frame`

) containing the variables in the formula`formula`

. By default the variables are taken from`environment(formula)`

.- subset
an optional vector used to select rows (observations) of the data matrix

`x`

.- na.action
a function which indicates what should happen when the data contain

`NA`

s. The default is set by the`na.action`

setting of`options`

, and is`na.fail`

if that is unset. The ‘factory-fresh’ default is`na.omit`

.- x
a numeric matrix or data frame which provides the data for the principal components analysis.

- cor
a logical value indicating whether the calculation should use the correlation matrix or the covariance matrix. (The correlation matrix can only be used if there are no constant variables.)

- scores
a logical value indicating whether the score on each principal component should be calculated.

- covmat
a covariance matrix, or a covariance list as returned by

`cov.wt`

(and`cov.mve`

or`cov.mcd`

from package MASS). If supplied, this is used rather than the covariance matrix of`x`

.- fix_sign
Should the signs of the loadings and scores be chosen so that the first element of each loading is non-negative?

- …
arguments passed to or from other methods. If

`x`

is a formula one might specify`cor`

or`scores`

.- object
Object of class inheriting from

`"princomp"`

.- newdata
An optional data frame or matrix in which to look for variables with which to predict. If omitted, the scores are used. If the original fit used a formula or a data frame or a matrix with column names,

`newdata`

must contain columns with the same names. Otherwise it must contain the same number of columns, to be used in the same order.

##### Details

`princomp`

is a generic function with `"formula"`

and
`"default"`

methods.

The calculation is done using `eigen`

on the correlation or
covariance matrix, as determined by `cor`

. This is done for
compatibility with the S-PLUS result. A preferred method of
calculation is to use `svd`

on `x`

, as is done in
`prcomp`

.

Note that the default calculation uses divisor `N`

for the
covariance matrix.

The `print`

method for these objects prints the
results in a nice format and the `plot`

method produces
a scree plot (`screeplot`

). There is also a
`biplot`

method.

If `x`

is a formula then the standard NA-handling is applied to
the scores (if requested): see `napredict`

.

`princomp`

only handles so-called R-mode PCA, that is feature
extraction of variables. If a data matrix is supplied (possibly via a
formula) it is required that there are at least as many units as
variables. For Q-mode PCA use `prcomp`

.

##### Value

`princomp`

returns a list with class `"princomp"`

containing the following components:

the standard deviations of the principal components.

the matrix of variable loadings (i.e., a matrix
whose columns contain the eigenvectors). This is of class
`"loadings"`

: see `loadings`

for its `print`

method.

the means that were subtracted.

the scalings applied to each variable.

the number of observations.

if `scores = TRUE`

, the scores of the supplied
data on the principal components. These are non-null only if
`x`

was supplied, and if `covmat`

was also supplied if it
was a covariance list. For the formula method,
`napredict()`

is applied to handle the treatment of
values omitted by the `na.action`

.

the matched call.

If relevant.

##### Note

The signs of the columns of the loadings and scores are arbitrary, and
so may differ between different programs for PCA, and even between
different builds of R: `fix_sign = TRUE`

alleviates that.

##### References

Mardia, K. V., J. T. Kent and J. M. Bibby (1979).
*Multivariate Analysis*, London: Academic Press.

Venables, W. N. and B. D. Ripley (2002).
*Modern Applied Statistics with S*, Springer-Verlag.

##### See Also

`summary.princomp`

, `screeplot`

,
`biplot.princomp`

,
`prcomp`

, `cor`

, `cov`

,
`eigen`

.

##### Examples

`library(stats)`

```
# NOT RUN {
require(graphics)
## The variances of the variables in the
## USArrests data vary by orders of magnitude, so scaling is appropriate
(pc.cr <- princomp(USArrests)) # inappropriate
princomp(USArrests, cor = TRUE) # =^= prcomp(USArrests, scale=TRUE)
## Similar, but different:
## The standard deviations differ by a factor of sqrt(49/50)
summary(pc.cr <- princomp(USArrests, cor = TRUE))
loadings(pc.cr) # note that blank entries are small but not zero
## The signs of the columns of the loadings are arbitrary
plot(pc.cr) # shows a screeplot.
biplot(pc.cr)
## Formula interface
princomp(~ ., data = USArrests, cor = TRUE)
## NA-handling
USArrests[1, 2] <- NA
pc.cr <- princomp(~ Murder + Assault + UrbanPop,
data = USArrests, na.action = na.exclude, cor = TRUE)
# }
# NOT RUN {
pc.cr$scores[1:5, ]
# }
# NOT RUN {
## (Simple) Robust PCA:
## Classical:
(pc.cl <- princomp(stackloss))
# }
# NOT RUN {
## Robust:
(pc.rob <- princomp(stackloss, covmat = MASS::cov.rob(stackloss)))
# }
```

*Documentation reproduced from package stats, version 3.6.0, License: Part of R 3.6.0*