Last chance! 50% off unlimited learning
Sale ends in
This function conducts supervised sparse and functional principal component analysis by fitting the SupSVD model X=UV' + E U=YB + F where X is an observed primary data matrix (to be decomposed), U is a latent score matrix, V is a loading matrix, E is measurement noise, Y is an observed auxiliary supervision matrix, B is a coefficient matrix, and F is a random effect matrix.
It decomposes the primary data matrix X into low-rank components, while taking into account many different features: 1) potential supervision from any auxiliary data Y measured on the same samples; 2) potential smoothness for loading vectors V (for functional data); 3) sparsity in supervision coefficients B and loadings V (for variable selection).
It is a very general dimension reduction method that subsumes PCA, sparse PCA, functional PCA, supervised PCA, etc as special cases. See more details in 2016 JCGS paper "Supervised sparse and functional principal component analysis" by Gen Li, Haipeng Shen, and Jianhua Z. Huang.
SupSFPCA(
Y,
X,
r,
ind_lam = 1,
ind_alp = 1,
ind_gam = 1,
ind_Omg = 1,
Omega = 0,
max_niter = 10^3,
convg_thres = 10^-6,
vmax_niter = 10^2,
vconvg_thres = 10^-4
)
n*q (column centered) auxiliary data matrix, rows are samples and columns are variables
n*p (column centered) primary data matrix, which we want to decompose. rows are samples (matched with Y) and columns are variables
positive scalar, prespecified rank (r should be smaller than n and p)
0 or 1 (default=1, sparse loading), sparsity index for loadings
0 or 1 (default=1, smooth loading), smoothness index for loadings
0 or 1 (default=1, sparse coefficient), sparsity index for supervision coefficients. Note: if gamma is set to be 0, Y must have q<n to avoid overfitting; if gamma is set to be 1, then it can handle high dimensional supervision Y
p*p symmetric positive semi-definite matrix for smoothness penalty (default is for evenly spaced data) Note: only change this if you have unevenly spaced functional data X
??
scalar (default=1E3), max number of overall iterations
positive scalar (default=1E-6), overall convergence threshold
scalar (default=1E2), max number of iterations for estimating each loading vector
positive scalar (default=1E-4), convergence threshold for the proximal gradient descent algorithm for estimating each loading vector
list with components
q*r coefficient matrix of Y on the scores of X,maybe sparse if gamma=1
p*r loading matrix of X, each column has norm 1, but no strict orthogonality because of sparsity and smoothness. If lambda=1, V is sparse; if alpha=1, each column of V is smooth
n*r score matrix of X, conditional expectation of random scores, no strict orthogonality
scalar, variance of measurement error in the primary data X
r*r diagonal covariance matrix, for random effects (see paper)
# NOT RUN {
library(spls)
data(yeast)
r <- 4
ydata <- as.data.frame(yeast[1])
xdata <- as.data.frame(yeast[2])
yc <- scale(ydata,center = TRUE,scale=FALSE)
xc <- scale(xdata,center=TRUE,scale=FALSE)
SupSFPCA(yc,xc,r)
# }
Run the code above in your browser using DataLab