SupSFPCA: Supervised Sparse and Functional Principal Component Analysis

Description

This function conducts supervised sparse and functional principal component analysis by fitting the SupSVD model X=UV' + E U=YB + F where X is an observed primary data matrix (to be decomposed), U is a latent score matrix, V is a loading matrix, E is measurement noise, Y is an observed auxiliary supervision matrix, B is a coefficient matrix, and F is a random effect matrix.

It decomposes the primary data matrix X into low-rank components, while taking into account many different features: 1) potential supervision from any auxiliary data Y measured on the same samples; 2) potential smoothness for loading vectors V (for functional data); 3) sparsity in supervision coefficients B and loadings V (for variable selection).

It is a very general dimension reduction method that subsumes PCA, sparse PCA, functional PCA, supervised PCA, etc as special cases. See more details in 2016 JCGS paper "Supervised sparse and functional principal component analysis" by Gen Li, Haipeng Shen, and Jianhua Z. Huang.

Usage

SupSFPCA(
  Y,
  X,
  r,
  ind_lam = 1,
  ind_alp = 1,
  ind_gam = 1,
  ind_Omg = 1,
  Omega = 0,
  max_niter = 10^3,
  convg_thres = 10^-6,
  vmax_niter = 10^2,
  vconvg_thres = 10^-4
)

Arguments

n*q (column centered) auxiliary data matrix, rows are samples and columns are variables

n*p (column centered) primary data matrix, which we want to decompose. rows are samples (matched with Y) and columns are variables

positive scalar, prespecified rank (r should be smaller than n and p)

ind_lam

0 or 1 (default=1, sparse loading), sparsity index for loadings

ind_alp

0 or 1 (default=1, smooth loading), smoothness index for loadings

ind_gam

0 or 1 (default=1, sparse coefficient), sparsity index for supervision coefficients. Note: if gamma is set to be 0, Y must have q<n to avoid overfitting; if gamma is set to be 1, then it can handle high dimensional supervision Y

ind_Omg

p*p symmetric positive semi-definite matrix for smoothness penalty (default is for evenly spaced data) Note: only change this if you have unevenly spaced functional data X

Omega

max_niter

scalar (default=1E3), max number of overall iterations

convg_thres

positive scalar (default=1E-6), overall convergence threshold

vmax_niter

scalar (default=1E2), max number of iterations for estimating each loading vector

vconvg_thres

positive scalar (default=1E-4), convergence threshold for the proximal gradient descent algorithm for estimating each loading vector

Value

list with components

q*r coefficient matrix of Y on the scores of X,maybe sparse if gamma=1

p*r loading matrix of X, each column has norm 1, but no strict orthogonality because of sparsity and smoothness. If lambda=1, V is sparse; if alpha=1, each column of V is smooth

n*r score matrix of X, conditional expectation of random scores, no strict orthogonality

se2:

scalar, variance of measurement error in the primary data X

Sf:

r*r diagonal covariance matrix, for random effects (see paper)

Note: Essentially, U and V are the most important output for dimension reduction purpose as in PCA or SVD.

Examples

Run this code

# NOT RUN {
library(spls)
data(yeast)
r <- 4
ydata <- as.data.frame(yeast[1])
xdata <- as.data.frame(yeast[2])
yc <- scale(ydata,center = TRUE,scale=FALSE)
xc <- scale(xdata,center=TRUE,scale=FALSE)
SupSFPCA(yc,xc,r)
# }

Run the code above in your browser using DataLab

Last chance! 50% off unlimited learning

Description

Usage

Arguments

Value

Examples