do.cca: Canonical Correlation Analysis

Description

Canonical Correlation Analysis (CCA) is similar to Partial Least Squares (PLS), except for one objective; while PLS focuses on maximizing covariance, CCA maximizes the correlation. This difference sometimes incurs quite distinct results compared to PLS. For algorithm aspects, we used recursive gram-schmidt orthogonalization in conjunction with extracting projection vectors under eigen-decomposition formulation, as the problem dimension matters only up to original dimensionality. For more details, see Wikipedia entry on Canonical Correlation.

Usage

do.cca(data1, data2, ndim = 2)

Arguments

data1

an \((n\times N)\) data matrix whose rows are observations

data2

an \((n\times M)\) data matrix whose rows are observations

ndim

an integer-valued target dimension.

Value

a named list containing

Y1: an \((n\times ndim)\) matrix of projected observations from data1.
Y2: an \((n\times ndim)\) matrix of projected observations from data2.
projection1: a \((N\times ndim)\) whose columns are loadings for data1.
projection2: a \((M\times ndim)\) whose columns are loadings for data2.
trfinfo1: a list containing information for out-of-sample prediction for data1.
trfinfo2: a list containing information for out-of-sample prediction for data2.
eigvals: a vector of eigenvalues for iterative decomposition.

References

hotelling_relations_1936Rdimtools

Examples

Run this code

# NOT RUN {
## generate 2 normal data matrices
mat1 = matrix(rnorm(100*12),nrow=100)+10 # 12-dim normal
mat2 = matrix(rnorm(100*6), nrow=100)-10 # 6-dim normal

## project onto 2 dimensional space for each data
output = do.cca(mat1, mat2, ndim=2)

## visualize
par(mfrow=c(1,2))
plot(output$Y1[,1], output$Y1[,2], main="proj(mat1)")
plot(output$Y2[,1], output$Y2[,2], main="proj(mat2)")
# }
# NOT RUN {
# }