Learn R Programming

FRegSigCom (version 0.3.0)

cv.sof.spike: Cross-validation for linear scalar-on-function regression for highly densely observed spiky functional data

Description

This function is used to perform cross-validation and build the final model for highly densely observed spiky data using the signal compression approach for the following linear scalar-on-function regression model: $$Y= \mu+\sum_{i=1}^p\int_{a_i}^{b_i}X_i(s)\beta_i(s)ds+\varepsilon$$ where \(\mu\) is the intercept. The \(\{X_i(s),1\le i\le p\}\) are \(p\) functional predictors and \(\{\beta_i(s),1\le i\le p\}\) are their corresponding coefficient functions, where \(p\) is a positive integer. The \(\epsilon\) is the random noise.

We require that all the sample curves of each functional predictor are observed in a common dense grid of time points, but the grid can be different for different predictors. All the sample curves of the functional response are observed in a common dense grid.

Usage

cv.sof.spike(X, Y, t.x, K.cv = 10, upper.level = 10)

Arguments

X

a list of length \(p\), the number of functional predictors. Its \(i\)-th element is the \(n\times m_i\) data matrix for the \(i\)-th functional predictor \(X_i(s)\), where \(n\) is the sample size and \(m_i\) is the number of observation time points for \(X_i(s)\).

Y

an \(n\) dimensional vector of the observed values for the response, where \(n\) is the sample size.

t.x

a list of length \(p\). Its \(i\)-th element is the vector of obesrvation time points of the \(i\)-th functional predictor \(X_i(s)\), \(1\le i\le p\).

K.cv

the number of CV folds. Default is 10.

upper.level

the upper bound of the maximum resolution level. The optimal maximum resolution level is chosen between 1 and "upper.level", together with other tuning parameters, by cross-validation.

Value

An object of the ``cv.sof.spike'' class, which is used in the function pred.sof.spike for prediction.

mu

the estimated intercept.

coef

a list of \(p\) vectors, where the \(i\)-th vector contains the estimated values of the slope coefficient function \(beta_i(s)\) at \(t.x\).

...

optimal tuning parameters

Details

This method uses wavelet basis to expand \(X_i(s)\) and \(\beta_i(s)\), (\(1 \le i \le p\)), and estimates the expansion coefficients of \(\beta_i(s)\)'s by penalized least squares method with penalty $$\lambda\sum_{i=1}^p \{\sum_{j=0}^{J_1}\{2^{-2\alpha e^{-(j-\tau)/\alpha}}2^{2\alpha j}||b_{ij}||^2+ \kappa ||b_i||^2\}\},$$ where \(b_{ij}\) denotes the vector of wavelet coefficient for \(\beta_i(s)\) at the \(j\)th level, and \(b_{i}\) is the vector concatenating all \(b_{ij}\), \((0\le j \le J_1)\).

References

Xin Qi and Ruiyan Luo, (manuscript) Functional regression for highly densely observed functional data with novel regularity.

Examples

Run this code
# NOT RUN {

##########################################################################
# Example: scalar-on-function for highly-densely observed curves
##########################################################################


ptm <- proc.time()
library(FRegSigCom)
data(Pork)
X=Pork$X
Y=Pork$Y
ntrain=40 # in paper, we use 80 observations as training data
xtrange=c(0,1) # the range of t in x(t).
t.x.list=list(seq(0,1,length.out=dim(X)[2]))
train.index=sample(1:dim(X)[1], ntrain)
X.train <- X.test <- list()

X.train[[1]]=X[train.index,]
X.test[[1]]=X[-(train.index),]
Y.train <- Y[train.index]
Y.test <- Y[-(train.index)]

fit.cv=cv.sof.spike(X.train, Y.train, t.x.list)
Y.pred=pred.sof.spike(fit.cv, X.test)
pred.error=mean((Y.pred-Y.test)^2)

print(c("pred.error=",pred.error))

print(proc.time()-ptm)


# }

Run the code above in your browser using DataLab