cv.hd: Cross-validation for sparse linear function-on-function regression with a large number of functional predictors

Description

Conduct cross-validation and build the final model for the following function-on-function regression model: $$Y(t)= \mu(t)+\sum_{i=1}^p\int_{a_i}^{b_i}X_i(s)\beta_i(s,t)ds+\varepsilon(t)$$ where $\mu(t)$ is the intercept function. The $\{X_i(s),1\le i\le p\}$ are $p$ functional predictors and $\{\beta_i(s,t),1\le i\le p\}$ are corresponding coefficient functions. The $\epsilon(t)$ is the noise function. The $p$ can be much larger than the sample size.

We require that all the sample curves of each functional predictor are observed in a common dense set of time points, but the set can be different for different functional predictor. All the sample curves of the response are observed in a common dense set.

Usage

cv.hd(X, Y, t.x.list, t.y, K.cv = 5, s.n.basis = 25, t.n.basis = 50,
       thresh = 0.01)

Arguments

a list of length $p$. Its $i$-th element is the $n\times m_i$ data matrix for the $i$-th funtional predictor $X_i(s)$, where $n$ is the sample size and $m_i$ is the number of the observation points for $X_i(s)$.

the $n\times m$ data matrix for the functional response $Y(t)$, where $n$ is the sample size and $m$ is the number of the observation points for $Y(t)$.

t.x.list

a list of length $p$. Its $i$-th element is the vector of observation points of the $i$-th functional predictor $X_i(s)$, $1\le i\le p$.

t.y

the vector of observation points of the functional response $Y(t)$.

K.cv

the number of CV folds. Default is 5.

s.n.basis

the number of B-spline basis functions used for estimating the functions $\psi_{ik}(s)$. Default is 25.

t.n.basis

the number of B-spline basis functions used for estimating the functions $w_{k}(t)$. Default is 50.

thresh

a number between 0 and 1 specifying the minimum proportion of variation to be explained by each selected component relative to all the selected components. This will determine the upper bound of the number of components for CV, and the optimal number of components will be determined from 1,2,..., to this uppber bound by CV. A smaller thresh value leads to more components to be selected and a longer running time. A larger thresh value needs less running time, but may miss some important components and lead to a larger prediction error. Default is 0.01.

Value

An object of the ``cv.hd'' class, which is used in the function pred.hd for prediction and getcoef.hd for extracting the estimated coefficient functions.

errors

list for CV errors.

min.error

minimum CV error.

opt.index

index of the optimal tunning parameters.

params.set

the set of candidate values for tuning parameters used in CV. It's a matrix with 4 columns corresponding to the tuning parameters $\tau$, $\lambda$, $\eta$, $\kappa$, respectively.

opt.K

optimal number of components to be selected.

opt.tau

optimal value for $\tau$ tuning the simultaenous sparse-smooth penalty on $\{\psi_{ik}(s), 1\le i \le p \}$.

opt.lambda

optimal value for $\lambda$, the sparsity parameter for $\{\psi_{ik}(s), 1\le i \le p \}$.

opt.eta

optimal value for $\eta$, the smoothness parameter for $\psi_{ik}(s)$.

opt.kappa

optimal value for $\kappa$, the smoothness parameter for $w_k(t)$.

maxK.ret

a list for internal use.

t.x.list

the input t.x.list.

t.y

the input t.y.

y.params

a list for internal use.

Details

This method estimates a special decomposition of the coefficient functions induced by the KL expansion of the signal function: $$\beta_i(s,t)=\sum_{k=1}^\infty\psi_{ik}(s)w_k(t), 1\le i\le p.$$ We first estimate $\bold{\psi}_k=(\psi_{1k}(s),...,\psi_{pk}(s))$ for each $0<k<K$, by solving the generalied functional eigenvalue problem $$max_{\bold{\psi}} \frac{\int\int \bold{\psi}(s)^T\hat{B}(s,s')\bold{\psi}(s')dsds'}{\int\int \bold{\psi}(s)^T\hat{\Sigma}(s,s')\bold{\psi}(s')dsds'+P(\bold{\psi})}$$ $$ {\rm{s.t.}} \quad \int\int \bold{\psi}(s)^T \hat{\Sigma}(s,s')\bold{\psi}(s')dsds'=1$$ $$ {\rm{and}} \quad \int\int \bold{\psi}(s)^T \hat{\Sigma}(s,s')\bold{\psi}_{k'}(s')dsds'=0 \quad {\rm{for}} \quad k'<k$$ with the simultaneous sparse and smooth penalty $$P(\bold{\psi}(s))=\tau \{(1-\lambda) \sum_{i=1}^p||\psi_{i}||_\eta^2 + \lambda (\sum_{i=1}^p ||\psi_{i}||_\eta)^2\},$$ where $||\psi_{i}||_\eta^2=||\psi_i||^2+\eta||\psi_i''||^2$. Here $\hat{B}$ and $\hat{\Sigma}$ are $p*p$ matrices of functions with the $(i,j)$ element respectively as $\hat{B}_{ij}(s,s')=\sum_{\ell=1}^n\sum_{\ell'=1}^n \{x_{\ell,i}(s)-\bar{x}_i(s)\}\int\{y_{\ell}(t)-\bar{y}(t)\}\{y_{\ell'}(t)-\bar{y}(t)\}dt \{x_{\ell',j}(s')-\bar{x}_j(s')\}/n^2$ and $\hat{\Sigma}_{ij}(s,s')=\sum_{\ell=1}^n \{x_{\ell,i}(s)-\bar{x}_i(s)\}\{x_{\ell,j}(s')-\bar{x}_j(s')\}/n$, where $(x_{\ell,1},...,x_{\ell,p},y_{\ell})$ represents the $\ell$-th sample. Then we estimate $\{w_{k}(t), 0<k<K\}$ by regressing $\{y_{\ell}(t)\}$ on $\{\hat{z}_{\ell,1},... \hat{z}_{\ell,K}\}$ with penalty $\kappa \sum_{k=0}^K \|w''_k\|^2$ tuned by the smoothness parameter $\kappa$. Here $\hat{z}_{\ell,k}= \sum_{i=1}^p \int (x_{\ell,i}(s)-\bar{x}_i(s))\hat{\psi}_{ik}(s)ds$. The number of selected componets $K$ will be chosen by cross-validation.

References

Xin Qi and Ruiyan Luo (2018) Function-on-Function Regression with thousands of predictive curves, Jounal of Multivariate Analysis 163:51-66. https://doi.org/10.1016/j.jmva.2017.10.002

Examples

Run this code

# NOT RUN {
#########################################
#toy example using the air quality data with p=1
#########################################
data(air)
t.x=seq(0,1,length=24)
t.y=seq(0,1,length=24)
air.cv=cv.hd(X=list(air[[2]][1:20,]), Y=air[[1]][1:20,], list(t.x), t.y,
             K.cv=2, s.n.basis = 8, t.n.basis = 8)
air.pred=pred.hd(air.cv, list(air[[2]][1:2,]))
predict.error=mean((air.pred-air[[1]][1:2,])^2)
print(c("predict error=", predict.error))


# }
# NOT RUN {
#########################################
#example with simulated data and p=200
#########################################

rm(list=ls())
ptm <- proc.time()
library(MASS)

n.curves=200 # total number of predictor curves.

t.x=seq(0,1,length.out=80) # observation points for X_i(s).
t.y=seq(0,1,length.out=100) # observation points for Y(t).
ntrain <- 100 # number of observations in training data
nnew <-50 # number of new observations
ntot <- ntrain+nnew
########################################
##generate the predictor curves using cos() basis functions.
############################################
W <- list()
for(j in 1:(n.curves+5))
{
  W[[j]] <- 0
  for(k in 1:30)
  W[[j]] <- W[[j]]+rnorm(ntot) %*% t(cos(k*pi*t.x))/k
}
X=lapply(1:n.curves,
function(k){(W[[k]]+W[[k+1]]+W[[k+2]]+W[[k+3]]+W[[k+4]]
             +W[[k+5]])/k})
########################################
##generate the coefficient functions. Only the first five are nonzero.
############################################
mu=sin(2*pi*t.y)
B.coef=list()
B.coef[[1]]=sapply(1:length(t.y), function(k){4*t.x^2*t.y[k]})
B.coef[[2]]=sapply(1:length(t.y), function(k){4*sin(pi*t.x)*cos(pi*t.y[k])})
B.coef[[3]]=sapply(1:length(t.y), function(k){4*exp(-((t.x-0.5)^2+(t.y[k]-0.5)^2)/2)})
B.coef[[4]]=sapply(1:length(t.y), function(k){4*t.x/(1+t.y[k]^2)})
B.coef[[5]]=sapply(1:length(t.y), function(k){4*log(1+t.x)*sqrt(t.y[k])})
########################################
##generate the response curves
############################################
Y=0
for(j in 1:5){
Y<- Y+ (X[[j]] %*% B.coef[[j]])/length(t.x)
}
E=sqrt(0.01)*matrix(rnorm(ntot*length(t.y)), ntot, length(t.y))
Y=rep(1,ntot)%*%t(mu)+Y+E
X.train=lapply(1:n.curves, function(k){X[[k]][(1:ntrain),]})
X.new=lapply(1:n.curves, function(k){X[[k]][-(1:ntrain),]})
Y.train <- Y[1:ntrain,]
Y.new <- Y[-(1:ntrain),]
E.new=E[-(1:ntrain),]

########################################
##fit the model using the sparse function-on-function method
############################################
t.x.list=lapply(1:n.curves, function(k){t.x})
fit.cv <- cv.hd(X.train, Y.train, t.x.list, t.y,   K.cv=5, s.n.basis=25, t.n.basis=40)
Y.pred=pred.hd(fit.cv, X.new)
predict.error=mean((Y.pred-Y.new)^2)
est.error=mean((Y.pred-Y.new+E.new)^2)
print(c("predict error=", predict.error))
print(c("estimation error=", est.error))
est.coefficient=getcoef.hd(fit.cv)
mu.est=est.coefficient[[1]]
beta.est=est.coefficient[[2]]
# }

Run the code above in your browser using DataLab