Learn R Programming

mdatools (version 0.7.0)

pca: Principal Component Analysis

Description

pca is used to build and explore a principal component analysis (PCA) model.

Usage

pca(x, ncomp = 15, center = T, scale = F, cv = NULL, x.test = NULL, alpha = 0.05, method = "svd", info = "")

Arguments

x
a numerical matrix with calibration data.
ncomp
maximum number of components to calculate.
center
logical, do mean centering of data or not.
scale
logical, do sdandardization of data or not.
cv
number of segments for random cross-validation (1 for full cross-validation).
x.test
a numerical matrix with test data.
alpha
significance level for calculating limit for Q residuals.
method
method to compute principal components.
info
a short text line with model description.

Value

Returns an object of pca class with following fields:
ncomp
number of components included to the model.
ncomp.selected
selected (optimal) number of components.
loadings
matrix with loading values (nvar x ncomp).
eigenvals
vector with eigenvalues for all existent components.
expvar
vector with explained variance for each component (in percent).
cumexpvar
vector with cumulative explained variance for each component (in percent).
T2lim
statistical limit for T2 distance.
Qlim
statistical limit for Q residuals.
info
information about the model, provided by user when build the model.
calres
an object of class pcares with PCA results for a calibration data.
testres
an object of class pcares with PCA results for a test data, if it was provided.
cvres
an object of class pcares with PCA results for cross-validation, if this option was chosen.

Details

So far only SVD (Singular Value Decompisition) method is available, more coming soon.

By default pca uses number of components (ncomp) as a minimum of number of objects - 1, number of variables and default or provided value. Besides that, there is also a parameter for selecting an optimal number of components (ncomp.selected). The optimal number of components is used to build a residuals plot (with Q residuals vs. Hotelling T2 values), calculate confidence limits for Q residuals, as well as for SIMCA classification.

If data contains missing values (NA) the pca will use an iterative algorithm to fit the values with most probable ones. The algorithm is implemented in a function pca.mvreplace. The same center and scale options will be used. You can also do this step manually before calling pca and play with extra options.

See Also

Methods for pca objects:
plot.pca
makes an overview of PCA model with four plots.
summary.pca
shows some statistics for the model.
selectCompNum.pca
set number of optimal components in the model
predict.pca
applies PCA model to a new data.
plotScores.pca
shows scores plot.
plotLoadings.pca
shows loadings plot.
plotVariance.pca
shows explained variance plot.
plotCumVariance.pca
shows cumulative explained variance plot.
plotResiduals.pca
shows Q vs. T2 residuals plot.
Most of the methods for plotting data are also available for PCA results (pcares) objects. Also check pca.mvreplace, which replaces missing values in a data matrix with approximated using iterative PCA decomposition.

Examples

Run this code
library(mdatools)
### Examples for PCA class

## 1. Make PCA model for People data with autoscaling
## and full cross-validation

data(people)
model = pca(people, scale = TRUE, cv = 1, info = 'Simple PCA model')
model = selectCompNum(model, 4)
summary(model)
plot(model, show.labels = TRUE)

## 2. Add missing values, make a new model and show plots
peoplemv = people
peoplemv[2, 7] = NA
peoplemv[6, 2] = NA
peoplemv[10, 4] = NA
peoplemv[22, 12] = NA

modelmv = pca(peoplemv, scale = TRUE, info = 'Model with missing values')
modelmv = selectCompNum(modelmv, 4)
summary(modelmv)
plot(modelmv, show.labels = TRUE)

## 3. Show scores and loadings plots for the model
par(mfrow = c(2, 2))
plotScores(model, comp = c(1, 3), show.labels = TRUE)
plotScores(model, comp = 2, type = 'h', show.labels = TRUE)
plotLoadings(model, comp = c(1, 3), show.labels = TRUE)
plotLoadings(model, comp = c(1, 2), type = 'h', show.labels = TRUE)
par(mfrow = c(1, 1))

## 4. Show residuals and variance plots for the model
par(mfrow = c(2, 2))
plotVariance(model, type = 'h')
plotCumVariance(model, show.labels = TRUE, legend.position = 'bottomright')
plotResiduals(model, show.labels = TRUE)
plotResiduals(model, ncomp = 2, show.labels = TRUE)
par(mfrow = c(1, 1))

Run the code above in your browser using DataLab