subselect (version 0.1-1)

gcd.coef: COMPUTES YANAI'S GCD FOR THE VARIABLE-SUBSET SELECTION PROBLEM

Description

Computes Yanai's Generalized Coefficient of Determination for the similarity of the subspaces spanned by a subset of variables and a subset of the full data set's Principal Components.

Usage

gcd.coef(mat, indices, pcindices = seq(1:length(indices)))

Arguments

mat
the full data set's covariance (or correlation) matrix.
indices
a numerical vector giving the indices of the variables in the subset.
pcindices
a numerical vector of indices of Principal Components. By default, the first k PCs are chosen, where k is the cardinality of the subset of variables specified by indices.

Value

  • The value of the GCD coefficient.

Details

Computes Yanai's Generalized Coefficient of Determination for the similarity of the subspaces spanned by a subset of variables (specified by indices) and a subset of the full-data set's Principal Components (specified by pcindices). Input data is expected in the form of a (co)variance or correlation matrix. If a non-square matrix is given, it is assumed to be a data matrix, and its (co)variance matrix is used as input. The number of variables (k) and of PCs (q) does not have to be the same.

Yanai's GCD is defined as: $$GCD = \frac{\mathrm{tr}(P_v\cdot P_c)}{\sqrt{k\cdot q}}$$ where $P_v$ and $P_c$ are the matrices of orthogonal projections on the subspaces spanned by the k-variable subset and by the q-Principal Component subset, respectively.

This definition is equivalent to: $$GCD = \frac{1}{\sqrt{k q}} \sum\limits_{i}(r_m)_i^2$$ where $(r_m)_i$ stands for the multiple correlation between the i-th Principal Component and the k-variable subset, and the sum is carried out over the q PCs (i=1,...,q) selected.

These definitions are also equivalent to the expression used in the code, which only requires the covariance (or correlation) matrix of the data under consideration.

References

Cadima, J. and Jolliffe, I.T. (2001), "Variable Selection and the Interpretation of Principal Subspaces", Journal of Agricultural, Biological and Environmental Statistics, Vol. 6, 62-79.

Ramsay, J.O., ten Berge, J. and Styan, G.P.H. (1984), "Matrix Correlation", Psychometrika, 49, 403-423.

Examples

Run this code
data(iris3) 
x<-iris3[,,1]
gcd.coef(cor(x),c(1,3))
## [1] 0.7666286
gcd.coef(cor(x),c(1,3),pcindices=c(1,3))
## [1] 0.584452
gcd.coef(cor(x),c(1,3),pcindices=1)
## [1] 0.6035127

Run the code above in your browser using DataLab