Learn R Programming

Rdimtools (version 0.1.2)

do.pca: Principal Component Analysis

Description

do.pca performs a classical principal component analysis (PCA) using RcppArmadillo package for faster and efficient computation.

Usage

do.pca(X, ndim = "auto", cor = FALSE, preprocess = "center",
  varratio = 0.9)

Arguments

X

an (n-by-p) matrix or data frame whose rows are observations and columns represent independent variables.

ndim

an integer-valued target dimension or ``auto'' option using varratio.

cor

mode of eigendecomposition. FALSE for decomposing covariance matrix, and TRUE for correlation matrix.

preprocess

an option for preprocessing the data. This supports three methods, ``center'',``decorrelate'', or ``whiten''. See also aux.preprocess for more details.

varratio

a value in (0,1]. This value is only used when ndim is chosen as ``auto''.

Value

a named list containing

Y

an (n-by-ndim) matrix whose rows are embedded observations.

vars

a vector containing variances of projected data onto principal components.

projection

a (p-by-ndim) whose columns are principal components.

trfinfo

a list containing information for out-of-sample prediction.

Details

A combination of ndim="auto" and varratio options is to automatically decide the target dimension based on cumulative sum of variance. Measured by summation of top eigenvalues from sample covariance, we use the minimal summation to be larger than varratio.

References

pearson_liii._1901Rdimtools

Examples

Run this code
# NOT RUN {
# generate data
X <- rbind(matrix(rnorm(100),nr=10),matrix(rnorm(100),nr=10)+10)

## 1. projection using 2 principal components
output <- do.pca(X,ndim=2)
plot(output$Y[,1],output$Y[,2])

## 2. automatic detection of target dimension accounting for 98% of variance
output <- do.pca(X,ndim="auto",varratio=0.98)           # perform PCA
plot(seq_len(length(output$vars)),output$vars,type="b") # plot variances
# }

Run the code above in your browser using DataLab