cca: Canonical Correlation Analysis

Description

Compute pairwise CCA-based similarities between multiple representations, summarized by either Yanai’s GCD measure ramsay_1984_MatrixCorrelationrepsim or Pillai’s trace statistic raghu_2017_SVCCASingularVectorrepsim.

Usage

cca(mats, summary_type = NULL)

Value

An \(M \times M\) symmetric matrix of CCA summary similarities.

Arguments

mats: A list of length \(M\) containing data matrices of size \((n_\mathrm{samples},\, p_k)\). All matrices must share the same number of rows for matching samples.
summary_type: Character scalar indicating the CCA summary statistic. One of "yanai" or "pillai". Defaults to "yanai" if NULL.

References

golub_1995_CanonicalCorrelationsMatrixrepsim

Examples

Run this code

# \donttest{
# --------------------------------------------------
# Use "iris" and "USArrests" datasets
#   1. apply scaling to reduce the effect of scales
#   2. add white noise to create multiple representations
#   3. generate 10 perturbations per each dataset
# --------------------------------------------------
# prepare the prototype
set.seed(1)
X = as.matrix(scale(as.matrix(iris[sample(1:150, 50, replace=FALSE),1:4])))
Y = as.matrix(scale(as.matrix(USArrests)))
n = nrow(X)
p_X = ncol(X)
p_Y = ncol(Y)

# generate 10 of each by perturbation
mats = vector("list", length=20L)
for (i in 1:10){
  mats[[i]] = X + matrix(rnorm(n*p_X, sd=1), nrow=n)
}
for (j in 11:20){
  mats[[j]] = Y + matrix(rnorm(n*p_Y, sd=1), nrow=n)
}

# compute two similarities
cca_gcd = cca(mats, summary_type="yanai")
cca_trace = cca(mats, summary_type="pillai")

# visualize
opar <- par(no.readonly=TRUE)
labs <- paste0("rep ",1:20)
par(pty="s", mfrow=c(1,2))

image(cca_gcd[,20:1], axes=FALSE, main="CCA:Yanai's GCD")
axis(1, seq(0, 1, length.out=20), labels = labs, las = 2)
axis(2, at = seq(0, 1, length.out=20), labels = labs[20:1], las = 2)

image(cca_trace[,20:1], axes=FALSE, main="CCA:Pillai's Trace")
axis(1, seq(0, 1, length.out=20), labels = labs, las = 2)
axis(2, at = seq(0, 1, length.out=20), labels = labs[20:1], las = 2)
par(opar)
# }

Run the code above in your browser using DataLab