Non-Negative and Sparse CCA
This package implements two algorithms for canonical correlation
analysis (CCA) that are based on iterated regression
steps. By choosing the appropriate regression algorithm for each data
modality, it is possible to enforce sparsity, non-negativity or other kinds
of constraints on the projection vectors. Multiple canonical variables are
computed sequentially using a generalized deflation scheme, where the
additional correlation not explained by previous variables is maximized.
'nscancor' is used to analyze paired data from two domains, and has the same
interface as the 'cancor' function from the 'stats' package (plus some extra
parameters). 'mcancor' is appropriate for analyzing data from three or more
An R package for non-negative and sparse canonical correlation analysis (CCA).
CCA is a method for finding associations between paired data sets. For example, a health study might record the gene expression levels and a number of physiological parameters for a patient cohort. If one conjectures that the cause for the physiological symptoms has a genetic component, one could expect to find a correlation between the expression of certain genes and the strength of certain symptoms. CCA finds a pair of linear projections (called canonical vectors), one for each data modality, such that the projected values (called canonical variables) have maximum correlation. The next pair of canonical variables is found by again maximizing their correlation, under the additional constraint that the they have to be uncorrelated to all previous ones, and so on.
CCA was first introduced by Hotelling in 1936, and has many similarities to principal component analysis (PCA). Where the PCA solution is computed from the eigenvalue decomposition (EVD) of the covariance matrix of a single data set, the CCA solution is computed from the EVD of the cross-covariance matrix of the two data sets. This approach is very efficient, but one sometimes encounters the following problems during an analysis. First, if at least one of the data sets contains more features than samples (a common case for gene expression data), there exist an infinite number of trivial projections that achieve perfect correlation. Regularization of the canonical vectors is necessary to again solve a well-posed problem. Second, the projections are typically linear combinations with non-zero weights for all features, which makes an interpretation of the weights difficult. A sparse solution which only includes a small number of important features is often desirable.
This package implements a CCA algorithm called
nscancor which can
enforce appropriate constraints on the canonical vectors to address
both aforementioned problems. Enforcing a bound on the Euclidean norm
(also called the L2 norm) of the projections avoids trivial
correlations. Enforcing a bound on the L1 norm leads to sparse
solutions, where many of the weights are exactly zero. And enforcing
non-negativity of the projection weights is useful for analysing data
where only positive influence of features is deemed appropriate. The
algorithm executes iterated regression steps, and the constraints
enter via the regression functions.
nscancor is therefore modular,
and builds on the great number of regression methods that are
available, e.g. ridge regression or the elastic net. By using two
different regression functions, the proper constraints can be enforced
for each domain.
The package also provides a generalization of constrained CCA for
analyzing more than two data sets. The
mcancor algorithm is
structurally analogous to
nscancor, but it maximizes the sum of all
pairwise correlations of canonical variables. As with
specifying the regression function for each domain makes it possible
to enforce appropriate constraints on each canonical vector.
This blog post explains how to use the package and demonstrates its benefits.
Functions in nscancor
|cardinality||Cardinality of Column Vectors|
|acor||Additional Explained Correlation|
|nscancor||Non-Negative and Sparse CCA|
|mcancor||Non-Negative and Sparse Multi-Domain CCA|
|macor||Multi-Domain Additional Explained Correlation|
Last month downloads
Include our badge in your README