Learn R Programming

mdatools (version 0.5.3)

pca: Principal Component Analysis

Description

pca is used to build and explore a principal component analysis (PCA) model.

Usage

pca(x, ncomp = 15, center = T, scale = F, cv = NULL, x.test = NULL, 
   alpha = 0.05, method = 'svd', info = '')

Arguments

x
a numerical matrix with calibration data.
ncomp
maximum number of components to calculate.
center
logical, do mean centering of data or not.
scale
logical, do sdandardization of data or not.
cv
number of segments for random cross-validation (1 for full cross-validation).
x.test
a numerical matrix with test data.
alpha
significance level for calculating limit for Q2 residuals.
method
method to compute principal components.
info
a short text line with model description.

Value

  • Returns an object of pca class with following fields:
  • ncompnumber of components included to the model.
  • ncomp.selectedselected (optimal) number of components.
  • loadingsmatrix with loading values (nvar x ncomp).
  • eigenvalsvector with eigenvalues for all existent components.
  • expvarvector with explained variance for each component (in percent).
  • cumexpvarvector with cumulative explained variance for each component (in percent).
  • T2limstatistical limit for T2 distance.
  • Q2limstatistical limit for Q2 distance.
  • infoinformation about the model, provided by user when build the model.
  • calresan object of class pcares with PCA results for a calibration data.
  • testresan object of class pcares with PCA results for a test data, if it was provided.
  • cvresan object of class pcares with PCA results for cross-validation, if this option was chosen.

Details

So far only SVD (Singular Value Decompisition) method is available, more coming soon.

By default pca uses number of components (ncomp) as a minimum of number of objects - 1, number of variables and default or provided value. Besides that, there is also a parameter for selecting an optimal number of components (ncomp.selected). The optimal number of components is used to build a residuals plot (with Q2 residuals vs. Hotelling T2 values), calculate confidence limits for Q2 residuals, as well as for SIMCA classification.

If data contains missing values (NA) the pca will use an iterative algorithm to fit the values with most probable ones. The algorithm is implemented in a function pca.mvreplace. The same center and scale options will be used. You can also do this step manually before calling pca and play with extra options.

See Also

Methods for pca objects: ll{ plot.pca makes an overview of PCA model with four plots. summary.pca shows some statistics for the model. selectCompNum.pca set number of optimal components in the model predict.pca applies PCA model to a new data. plotScores.pca shows scores plot. plotLoadings.pca shows loadings plot. plotVariance.pca shows explained variance plot. plotCumVariance.pca shows cumulative explained variance plot. plotResiduals.pca shows Q2 vs. T2 residuals plot. } Most of the methods for plotting data are also available for PCA results (pcares) objects.

Other methods implemented in pca: ll{ pca.mvreplace replaces missing values in a data matrix with approximated using iterative PCA decomposition. }

Examples

Run this code
### Examples for PCA class

## 1. Make PCA model for People data with autoscaling
## and full cross-validation

library(mdatools)

data(people)
model = pca(people, scale = TRUE, cv = 1, info = 'Simple PCA model')
model = selectCompNum(model, 4)
summary(model)
plot(model, show.labels = TRUE)

## 2. Add missing values, make a new model and show plots
peoplemv = people
peoplemv[2, 7] = NA
peoplemv[6, 2] = NA
peoplemv[10, 4] = NA
peoplemv[22, 12] = NA

modelmv = pca(peoplemv, scale = TRUE, info = 'Model with missing values')
modelmv = selectCompNum(modelmv, 4)
summary(modelmv)
plot(modelmv, show.labels = TRUE)

## 3. Show scores and loadings plots for the model
par(mfrow = c(2, 2))
plotScores(model, comp = c(1, 3), show.labels = TRUE)
plotScores(model, comp = 2, type = 'h', show.labels = TRUE)
plotLoadings(model, comp = c(1, 3), show.labels = TRUE)
plotLoadings(model, comp = c(1, 2), type = 'h', show.labels = TRUE)
par(mfrow = c(1, 1))

## 4. Show residuals and variance plots for the model
par(mfrow = c(2, 2))
plotVariance(model, type = 'h')
plotCumVariance(model, show.labels = TRUE, legend.position = 'bottomright')
plotResiduals(model, show.labels = TRUE)
plotResiduals(model, ncomp = 2, show.labels = TRUE)
par(mfrow = c(1, 1))

Run the code above in your browser using DataLab