pca: Principal Component Analysis

Description

pca is used to build and explore a principal component analysis (PCA) model.

Usage

pca(x, ncomp = 15, center = T, scale = F, cv = NULL, x.test = NULL, 
   alpha = 0.05, method = 'svd', info = '')

Arguments

a numerical matrix with calibration data.

ncomp

maximum number of components to calculate.

center

logical, do mean centering of data or not.

scale

logical, do sdandardization of data or not.

number of segments for random cross-validation (1 for full cross-validation).

x.test

a numerical matrix with test data.

alpha

significance level for calculating limit for Q2 residuals.

method

method to compute principal components.

info

a short text line with model description.

Value

Returns an object of pca class with following fields:
ncompnumber of components included to the model.
ncomp.selectedselected (optimal) number of components.
loadingsmatrix with loading values (nvar x ncomp).
eigenvalsvector with eigenvalues for all existent components.
expvarvector with explained variance for each component (in percent).
cumexpvarvector with cumulative explained variance for each component (in percent).
T2limstatistical limit for T2 distance.
Q2limstatistical limit for Q2 distance.
infoinformation about the model, provided by user when build the model.
calresan object of class pcares with PCA results for a calibration data.
testresan object of class pcares with PCA results for a test data, if it was provided.
cvresan object of class pcares with PCA results for cross-validation, if this option was chosen.

Details

So far only SVD (Singular Value Decompisition) method is available, more coming soon.

By default pca uses number of components (ncomp) as a minimum of number of objects - 1, number of variables and default or provided value. Besides that, there is also a parameter for selecting an optimal number of components (ncomp.selected). The optimal number of components is used to build a residuals plot (with Q2 residuals vs. Hotelling T2 values), calculate confidence limits for Q2 residuals, as well as for SIMCA classification.

If data contains missing values (NA) the pca will use an iterative algorithm to fit the values with most probable ones. The algorithm is implemented in a function pca.mvreplace. The same center and scale options will be used. You can also do this step manually before calling pca and play with extra options.

Examples

Run this code

### Examples for PCA class

## 1. Make PCA model for People data with autoscaling
## and full cross-validation

library(mdatools)

data(people)
model = pca(people, scale = TRUE, cv = 1, info = 'Simple PCA model')
model = selectCompNum(model, 4)
summary(model)
plot(model, show.labels = TRUE)

## 2. Add missing values, make a new model and show plots
peoplemv = people
peoplemv[2, 7] = NA
peoplemv[6, 2] = NA
peoplemv[10, 4] = NA
peoplemv[22, 12] = NA

modelmv = pca(peoplemv, scale = TRUE, info = 'Model with missing values')
modelmv = selectCompNum(modelmv, 4)
summary(modelmv)
plot(modelmv, show.labels = TRUE)

## 3. Show scores and loadings plots for the model
par(mfrow = c(2, 2))
plotScores(model, comp = c(1, 3), show.labels = TRUE)
plotScores(model, comp = 2, type = 'h', show.labels = TRUE)
plotLoadings(model, comp = c(1, 3), show.labels = TRUE)
plotLoadings(model, comp = c(1, 2), type = 'h', show.labels = TRUE)
par(mfrow = c(1, 1))

## 4. Show residuals and variance plots for the model
par(mfrow = c(2, 2))
plotVariance(model, type = 'h')
plotCumVariance(model, show.labels = TRUE, legend.position = 'bottomright')
plotResiduals(model, show.labels = TRUE)
plotResiduals(model, ncomp = 2, show.labels = TRUE)
par(mfrow = c(1, 1))

Run the code above in your browser using DataLab