cv.msof.hd: Cross-validation for high dimensional linear multivariate scalar-on-function regression

Description

This function is used to perform cross-validation and build the final model using the signal compression approach for the following linear multivariate scalar-on-function regression model with a large number of functional predictors: $$\bold{Y}= \bold{\mu}+\sum_{i=1}^p\int_{a_i}^{b_i}X_i(s)\bold{\beta}_i(s)ds+\bold{\epsilon}$$ where $\bold{Y}$ is an $m$-dimensional multivariate response variable, $\bold{\mu}$ is the $m$-dimensional intercept vector. The $\{X_i(s),1\le i\le p\}$ are $p$ functional predictors and $\{\bold{\beta}_i(s),1\le i\le p\}$ are their corresponding $m$-dimensional vector of coefficient functions, where $p$ is a positive integer. The $\bold{\epsilon}$ is the random noise vector.

We require that all the sample curves of each functional predictor are observed in a common dense grid of time points, but the grid can be different for different predictors. All the sample curves of the functional response are observed in a common dense grid.

Usage

cv.msof.hd(X, Y, t.x.list, n.basis = 25, K.cv = 5, upper.comp = 10,
           thresh = 0.02)

Arguments

a list of length $p$, the number of functional predictors. Its $i$-th element is the $n\times m_i$ data matrix for the $i$-th functional predictor $X_i(s)$, where $n$ is the sample size and $m_i$ is the number of observation time points for $X_i(s)$.

an $n\times q$ data matrix for the response $Y$ or a $n$-dimensional vector if there is only one scalar response, where $n$ is the sample size, and $q$ is the number of scalar response variables.

t.x.list

a list of length $p$. Its $i$-th element is the vector of observation time points of the $i$-th functional predictor $X_i(s)$, $1\le i\le p$.

n.basis

the number of basis functions used for estimating the vector of functions $\psi_{ik}(s)$ (see the reference for details). Default is 50.

K.cv

the number of CV folds. Default is 5.

upper.comp

the upper bound for the maximum number of components to be calculated. Default is 10.

thresh

a number between 0 and 1 used to determine the maximum number of components we need to calculate. The maximum number is between one and the "upp.comp" above. The optimal number of components will be chosen between 1 and this maximum number, together with other tuning parameters by cross-validation. A smaller thresh value leads to a larger maximum number of components and a longer running time. A larger thresh value needs less running time, but may miss some important components and lead to a larger prediction error. Default is 0.02.

Value

An object of the ``cv.msof.hd'' class, which is used in the function pred.msof.hd for prediction.

opt.K

optimal number of components to be selected.

opt.tau

optimal value for $tau$.

opt.lambda

optimal value for $lambda$.

opt.eta

optimal value for $lambda$.

min.error

minimum CV error.

errors

list for CV errors.

...

for internal use

Details

We use the decomposition $\bold{\beta}_i(s)=\sum_{k=1}^K \alpha_{ki}(s) \bold{w}_k$, $1\le i \le p$, based on the KL expansion of $\sum_{i=1}^p \int X_i(s)\bold{\beta}_i(s)ds$. Let $\bold{Y}_{\ell}=(Y_{\ell,1},...,Y_{\ell,m})^T$ and $\bold{X}_{\ell}(t)=(X_{\ell,1}(s), ...,X_{\ell,p}(s))^T$, $1 \le \ell \le n$, denote $n$ independent samples. We estimate $\bold{\alpha}_k=(\alpha_{k1},...,\alpha_{kp})^T$ for each $k$ by solving the panelized generalized functional eigenvalue problem $$max_{\alpha} \frac{\int\int \bold{\alpha}(s)^T\hat{\bold{B}}(s,s')\bold{\alpha}(s')dsds'}{\int\int \bold{\alpha}(s)^T\hat{\bold{\Sigma}}(s,s')\bold{\alpha}(s')dsds'+P(\bold{\alpha})}$$ $$ {\rm{s.t.}} \quad \int\int \bold{\alpha}(s)^T \hat{\Sigma}(s,s')\bold{\alpha}(s')dsds'=1$$ $$ {\rm{and}} \quad \int\int \bold{\alpha}(s)^T \hat{\bold{\Sigma}}(s,s')\bold{\alpha}_{k'}(s')dsds'=0 \quad {\rm{for}} \quad k'<k$$ where $\hat{\bold{B}}(s,s')=\sum_{\ell=1}^n\sum_{\ell'=1}^n \{\bold{X}_{\ell}(s)-\bar{\bold{X}}(s)\}\{\bold{Y}_{\ell}-\bar{\bold{Y}}\}^T\{\bold{Y}_{\ell'}-\bar{\bold{Y}}\} \{\bold{X}_{\ell'}(s')-\bar{\bold{X}}(s')\}^T/n^2$, $\hat{\bold{\Sigma}}(s,s')=\sum_{\ell=1}^n \{\bold{X}_{\ell}(s)-\bar{\bold{X}}(s)\}\{\bold{X}_{\ell}(s')-\bar{\bold{X}}(s')\}^T/n$, and penalty $$P(\alpha)=\tau \{(1-\lambda)\sum_{i=1}^p\|\alpha_i\|_{\eta}^2 + \lambda (\sum_{i=1}^p \|\alpha_i\|_{\eta})^2\}$$ where $\|\alpha_i\|_{\eta}^2=\|\alpha_i\|^2+\eta\|\alpha_i''\|^2$. Then we estimate $\{w_{k}(t), k>0\}$ by regressing $\{\bold{Y}_{\ell}\}$ on $\{\hat{z}_{\ell,1},... \hat{z}_{\ell,K}\}$ using least square method. Here $\hat{z}_{\ell,k}= \int (\bold{X}_{\ell}(s)-\bar{\bold{X}}(s))^T\hat{\alpha}_{k}(s)ds$.

References

Ruiyan Luo and Xin Qi (Submitted)

Examples

Run this code

# NOT RUN {

#########################################################################
# Example:  
#########################################################################

#toy example

ptm <- proc.time()
library(FRegSigCom)
data(air)
Y=matrix(0, nrow(air[[1]]), 3)
Y[,1]=apply(air[[1]][,1:8],1,mean)
Y[,2]=apply(air[[1]][,(1:8)+8],1,mean)
Y[,3]=apply(air[[1]][,(1:8)+16],1,mean)

X=list()
for(i in 1:(length(air)-1))
{
  X[[i]]=air[[i+1]]
}
 
ntrain=100 # in paper, we use 80 observations as training data
t.x=seq(0,1,length.out=ncol(X[[1]]))
train.index=sample(1:nrow(X[[1]]), ntrain)
X.train <- X.test <- list()
t.x.list=list()
for(i in 1:length(X))
{
  t.x.list[[i]]=t.x
  X.train[[i]]=X[[i]][train.index,]
  X.test[[i]]=X[[i]][-(train.index),]
}
Y.train <- Y[train.index,]
Y.test <- Y[-(train.index),]

fit.cv=cv.msof.hd(X.train, Y.train, t.x.list, upper.comp=5)
# in practice, use the default values (or larger) for
# "upper.comp" and "n.basis".

Y.pred=pred.msof.hd(fit.cv, X.test)
pred.error=mean((Y.pred-Y.test)^2)
print(c("pred.error=",pred.error))

print(proc.time()-ptm)


# }

Run the code above in your browser using DataLab