Learn R Programming

mdatools (version 0.8.2)

pca: Principal Component Analysis

Description

pca is used to build and explore a principal component analysis (PCA) model.

Usage

pca(x, ncomp = 15, center = T, scale = F, cv = NULL, exclrows = NULL,
  exclcols = NULL, x.test = NULL, alpha = 0.05, method = "svd",
  info = "")

Arguments

x

a numerical matrix with calibration data.

ncomp

maximum number of components to calculate.

center

logical, do mean centering of data or not.

scale

logical, do sdandardization of data or not.

cv

number of segments for random cross-validation (1 for full cross-validation).

exclrows

rows to be excluded from calculations (numbers, names or vector with logical values)

exclcols

columns to be excluded from calculations (numbers, names or vector with logical values)

x.test

a numerical matrix with test data.

alpha

significance level for calculating limit for Q residuals.

method

method to compute principal components ('svd', 'nipals').

info

a short text line with model description.

Value

Returns an object of pca class with following fields:

ncomp

number of components included to the model.

ncomp.selected

selected (optimal) number of components.

loadings

matrix with loading values (nvar x ncomp).

eigenvals

vector with eigenvalues for all existent components.

expvar

vector with explained variance for each component (in percent).

cumexpvar

vector with cumulative explained variance for each component (in percent).

T2lim

statistical limit for T2 distance.

Qlim

statistical limit for Q residuals.

info

information about the model, provided by user when build the model.

calres

an object of class pcares with PCA results for a calibration data.

testres

an object of class pcares with PCA results for a test data, if it was provided.

cvres

an object of class pcares with PCA results for cross-validation, if this option was chosen.

Details

By default pca uses number of components (ncomp) as a minimum of number of objects - 1, number of variables and default or provided value. Besides that, there is also a parameter for selecting an optimal number of components (ncomp.selected). The optimal number of components is used to build a residuals plot (with Q residuals vs. Hotelling T2 values), calculate confidence limits for Q residuals, as well as for SIMCA classification.

You can provde number, names or logical values to exclode rows or columns from calibration and validation of PCA model. In this case the outcome, e.g. scores and loadings will correspond to the original size of the data, but:

  1. Loadings (and all performance statistics) will be computed without excluded objects and variables

  2. Matrix with loadings will have zero values for the excluded variables and the corresponding columns will be hidden.

  3. Matrix with scores will have score values calculated for the hidden objects but the rows will be hidden.

You can see scores and loadings for hidden rows and columns by using parameter 'show.excluded = T' in plots. If you see other packages to make plots (e.g. ggplot2) you will not be able to distinguish between hidden and normal objects.

See Also

Methods for pca objects:

plot.pca makes an overview of PCA model with four plots.
summary.pca shows some statistics for the model.
selectCompNum.pca set number of optimal components in the model
predict.pca applies PCA model to a new data.
plotScores.pca shows scores plot.
plotLoadings.pca shows loadings plot.
plotVariance.pca shows explained variance plot.
plotCumVariance.pca shows cumulative explained variance plot.
plotResiduals.pca shows Q vs. T2 residuals plot.

Most of the methods for plotting data are also available for PCA results (pcares) objects. Also check pca.mvreplace, which replaces missing values in a data matrix with approximated using iterative PCA decomposition.

Examples

Run this code
# NOT RUN {
library(mdatools)
### Examples for PCA class

## 1. Make PCA model for People data with autoscaling
## and full cross-validation

data(people)
model = pca(people, scale = TRUE, cv = 1, info = 'Simple PCA model')
model = selectCompNum(model, 4)
summary(model)
plot(model, show.labels = TRUE)

## 3. Show scores and loadings plots for the model
par(mfrow = c(2, 2))
plotScores(model, comp = c(1, 3), show.labels = TRUE)
plotScores(model, comp = 2, type = 'h', show.labels = TRUE)
plotLoadings(model, comp = c(1, 3), show.labels = TRUE)
plotLoadings(model, comp = c(1, 2), type = 'h', show.labels = TRUE)
par(mfrow = c(1, 1))

## 4. Show residuals and variance plots for the model
par(mfrow = c(2, 2))
plotVariance(model, type = 'h')
plotCumVariance(model, show.labels = TRUE, legend.position = 'bottomright')
plotResiduals(model, show.labels = TRUE)
plotResiduals(model, ncomp = 2, show.labels = TRUE)
par(mfrow = c(1, 1))

# }

Run the code above in your browser using DataLab