pca: Principal Component Analysis

Description

pca is used to build and explore a principal component analysis (PCA) model.

Usage

pca(x, ncomp = 15, center = T, scale = F, cv = NULL, x.test = NULL, alpha = 0.05, method = "svd", info = "")

Arguments

a numerical matrix with calibration data.

ncomp

maximum number of components to calculate.

center

logical, do mean centering of data or not.

scale

logical, do sdandardization of data or not.

number of segments for random cross-validation (1 for full cross-validation).

x.test

a numerical matrix with test data.

alpha

significance level for calculating limit for Q residuals.

method

method to compute principal components.

info

a short text line with model description.

Value

ncomp: number of components included to the model.
ncomp.selected: selected (optimal) number of components.
loadings: matrix with loading values (nvar x ncomp).
eigenvals: vector with eigenvalues for all existent components.
expvar: vector with explained variance for each component (in percent).
cumexpvar: vector with cumulative explained variance for each component (in percent).
T2lim: statistical limit for T2 distance.
Qlim: statistical limit for Q residuals.
info: information about the model, provided by user when build the model.
calres: an object of class pcares with PCA results for a calibration data.
testres: an object of class pcares with PCA results for a test data, if it was provided.
cvres: an object of class pcares with PCA results for cross-validation, if this option was chosen.

Details

So far only SVD (Singular Value Decompisition) method is available, more coming soon.

By default pca uses number of components (ncomp) as a minimum of number of objects - 1, number of variables and default or provided value. Besides that, there is also a parameter for selecting an optimal number of components (ncomp.selected). The optimal number of components is used to build a residuals plot (with Q residuals vs. Hotelling T2 values), calculate confidence limits for Q residuals, as well as for SIMCA classification.

If data contains missing values (NA) the pca will use an iterative algorithm to fit the values with most probable ones. The algorithm is implemented in a function pca.mvreplace. The same center and scale options will be used. You can also do this step manually before calling pca and play with extra options.

Examples

Run this code

library(mdatools)
### Examples for PCA class

## 1. Make PCA model for People data with autoscaling
## and full cross-validation

data(people)
model = pca(people, scale = TRUE, cv = 1, info = 'Simple PCA model')
model = selectCompNum(model, 4)
summary(model)
plot(model, show.labels = TRUE)

## 2. Add missing values, make a new model and show plots
peoplemv = people
peoplemv[2, 7] = NA
peoplemv[6, 2] = NA
peoplemv[10, 4] = NA
peoplemv[22, 12] = NA

modelmv = pca(peoplemv, scale = TRUE, info = 'Model with missing values')
modelmv = selectCompNum(modelmv, 4)
summary(modelmv)
plot(modelmv, show.labels = TRUE)

## 3. Show scores and loadings plots for the model
par(mfrow = c(2, 2))
plotScores(model, comp = c(1, 3), show.labels = TRUE)
plotScores(model, comp = 2, type = 'h', show.labels = TRUE)
plotLoadings(model, comp = c(1, 3), show.labels = TRUE)
plotLoadings(model, comp = c(1, 2), type = 'h', show.labels = TRUE)
par(mfrow = c(1, 1))

## 4. Show residuals and variance plots for the model
par(mfrow = c(2, 2))
plotVariance(model, type = 'h')
plotCumVariance(model, show.labels = TRUE, legend.position = 'bottomright')
plotResiduals(model, show.labels = TRUE)
plotResiduals(model, ncomp = 2, show.labels = TRUE)
par(mfrow = c(1, 1))

Run the code above in your browser using DataLab