perturbationRpca: Recursive PCA using a rank 1 perturbation method

Description

This function recursively updates the PCA with respect to a single new data vector, using the (fast) perturbation method of Hegde et al. (2006).

Usage

perturbationRpca(lambda, U, x, n, f = 1/n, center, sort = TRUE)

Value

A list with components

values: updated eigenvalues.
vectors: updated eigenvectors.

Arguments

lambda: vector of eigenvalues.
U: matrix of eigenvectors (PC) stored in columns.
x: new data vector.
n: sample size before observing x.
f: forgetting factor: a number between 0 and 1.
center: optional centering vector for x.
sort: Should the eigenpairs be sorted?

Details

The forgetting factor f can be interpreted as the inverse of the number of observation vectors effectively used in the PCA: the "memory" of the PCA algorithm goes back 1/f observations in the past. For larger values of f, the PCA update gives more relative weight to the new data x and less to the current PCA (lambda,U). For nonstationary processes, f should be closer to 1.
Only one of the arguments n and f needs being specified. If it is n, then f is set to 1/n by default (usual PCA of sample covariance matrix where all data points have equal weight). If f is specified, its value overrides any eventual specification of n.
If sort is TRUE, the updated eigenpairs are sorted by decreasing eigenvalue. Otherwise, they are not sorted.

References

Hegde et al. (2006) Perturbation-Based Eigenvector Updates for On-Line Principal Components Analysis and Canonical Correlation Analysis. Journal of VLSI Signal Processing.

Examples

Run this code

n <- 1e3
n0 <- 5e2
d <- 10
x <- matrix(runif(n*d), n, d)
 x <- x %*% diag(sqrt(12*(1:d)))
# The eigenvalues of cov(x) are approximately equal to 1, 2, ..., d
# and the corresponding eigenvectors are approximately equal to 
# the canonical basis of R^d

## Perturbation-based recursive PCA
# Initialization: use factor 1/n0 (princomp) rather 
# than factor 1/(n0-1) (prcomp) in calculations
pca <- princomp(x[1:n0,], center=FALSE)
xbar <- pca$center
pca <- list(values=pca$sdev^2, vectors=pca$loadings) 

for (i in (n0+1):n) {
	xbar <- updateMean(xbar, x[i,], i-1)
	pca <- perturbationRpca(pca$values, pca$vectors, x[i,], 
		i-1, center=xbar) }

Run the code above in your browser using DataLab