step.ff.interaction: Stepwise variable selection procedure for FOF regression models with two-way interactions

Description

This function conducts the stepwise procedure to select main effects, two-way interaction and quadratic effects for the following family of linear function-on-function interaction models. Let $\{X_i(s),1\le i\le p\}$ be $p$ potential functional predictors. The family of models is given by $$Y(t)= \mu(t)+\sum_{i \in M}\int_{a_i}^{b_i} X_i(s)\beta_i(s,t)ds+\sum_{(i,j) \in I}\int_{a_i}^{b_i}\int_{a_j}^{b_j} X_i(u)X_j(v)\gamma_{ij}(u,v,t)dudv+\epsilon(t)$$ where $\mu(t)$ is the intercept function. The index set $M$ of the main effects is a subset of $\{1,...,p\}$, and the index set $I$ of the interactions and quadratic effects is a subset of the collection of all possible pairs $\{(i,j), 1\le i\le j\le p\}$. We require that the models in each step satisfy the hierarchy principle: if the interaction $X_iX_j$ is included in the model, both the main effects $X_i$ and $X_j$ are included. The $\{\beta_i(s,t),i\in M\}$ and $\{\gamma_{ij}(u,v,t),(i,j)\in I\}$ are the corresponding coefficient functions. The $\epsilon(t)$ is the noise function. When the final model is selected, this function also fits the selected model.

Usage

step.ff.interaction(X, Y, t.x, t.y, adaptive=FALSE, s.n.basis=40, t.n.basis=40,
          inter.n.basis=20, basis.type.x="Bspline", basis.type.y="Bspline",
          K.cv=5, upper.comp=8, thresh=0.01)

Arguments

a list of $p$ potential functional predictors. Its $i$-th element is the $n\times m_i$ data matrix for the $i$-th potential functional predictor $X_i(s)$, where $n$ is the sample size and $m_i$ is the number of observation time points for $X_i(s)$.

the $n\times m$ data matrix for the functional response $Y(t)$, where $n$ is the sample size and $m$ is the number of the observation time points for $Y(t)$.

t.x

a list of length $p$. Its $i$-th element is the vector of obesrvation time points of the $i$-th functional predictor $X_i(s)$, $1\le i\le p$.

t.y

the vector of observation time points of the functional response $Y(t)$.

adaptive

a logic value indicating whether using adaptive penalty that uses different smoothness tuning parameters for different target functions. Default is FALSE.

s.n.basis

the number of basis functions used for estimating the functions $\psi_{ik}(s)$ (see details in cv.ff.interaction). Default is 40.

t.n.basis

the number of basis functions used for estimating the functions $w_{k}(t)$. Default is 40.

inter.n.basis

the number of one-dimensional basis functions used to construct the tensor product basis functions for estimating the functions $\phi_{ijk}(u,v)$. Default is 20.

basis.type.x

the type of basis functions $\psi_{ik}(s)$ . Only "BSpline" (default) and "Fourier" are supported.

basis.type.y

the type of basis functions $w_{k}(t)$. Only "BSpline" (default) and "Fourier" are supported.

K.cv

the number of CV folds. Default is 5.

upper.comp

the upper bound for the maximum number of components to be calculated. Default is 8.

thresh

a number between 0 and 1 used to determine the maximum number of components we need to calculate. The maximum number is between one and the "upp.comp" above. The optimal number of components will be chosen between 1 and this maximum number, together with other tuning parameters by cross-validation. A smaller thresh value leads to a larger maximum number of components and a longer running time. A larger thresh value needs less running time, but may miss some important components and lead to a larger prediction error. Default is 0.01.

Value

An object of the ``step.ff.interaction'' class, which is used in the function pred.ff.interaction for prediction and getcoef.ff.interaction for extracting the estimated coefficient functions.

opt.main.effects

a vector of indices of the selected main effects.

opt.interaction.effects

a matrix of two columns. Each row specifies the indices of a selected two-way interaction or quadratic effect.

fitted_model

a list of the output of the fitted selected model and only for internal use.

y_penalty_inv

a list for interval use.

the input X.

the input Y.

x.smooth.params

a list for internal use.

y.smooth.params

a list for internal use.

References

Ruiyan Luo and Xin Qi (2018) Interaction model and model selection for function-on-function regression, Journal of Computational and Graphical Statistics. https://doi.org/10.1080/10618600.2018.1514310

Examples

Run this code

# NOT RUN {
library(FRegSigCom)
data(ocean)

Y=ocean$Salinity
X=list()
X[[1]]=ocean$Potential.density
X[[2]]=ocean$Temperature
X[[3]]=ocean$Oxygen
n.curves=length(X)
ntot=dim(Y)[1]
ntrain=50
ntest=ntot-ntrain
X.uncent=X
for(i in 1:n.curves){
  X[[i]]=scale(X.uncent[[i]],center=TRUE, scale=FALSE)
}
lengthX=dim(X[[1]])[2]
lengthY=dim(Y)[2]
t.x=seq(0,1,length=lengthX)
t.y=seq(0,1,length=lengthY)
I.train=sample(1:ntot, ntrain)
X.train=list()
X.test=list()
t.x.all=list()
for(j in 1:n.curves){
  X.train[[j]]=X[[j]][I.train,]
  X.test[[j]]=X[[j]][-I.train,]
  t.x.all[[j]]=t.x
}
Y.train=Y[I.train, ]
Y.test=Y[-I.train, ]


###############################
#model selection
###############################

fit.step=step.ff.interaction(X.train, Y.train, t.x.all, t.y)
Y.pred=pred.ff.interaction(fit.step,  X.test)
error.selected=mean((Y.pred-Y.test)^2)
print(c("error.selected=", error.selected))
#coef.obj=getcoef.ff.interaction(fit.step)
#str(coef.obj)
# }

Run the code above in your browser using DataLab