fregre.gsam.vs: Variable Selection using Functional Additive Models

Description

Computes functional GAM model between functional covariate $(X^1(t_1),\cdots,X^{q}(t_q))$ (and non functional covariate $(Z^1,...,Z^p)$) and scalar response $Y$.

This function is an extension of the functional generalized linear regression models: fregre.glm where the $E[Y|X,Z]$ is related to the linear prediction $\eta$ via a link function $g(\cdot)$ with integrated smoothness estimation by the smooth functions $f(\cdot)$.

$$E[Y|X,Z])=\eta=g^{-1}(\alpha+\sum_{i=1}^{p}f_{i}(Z^{i})+\sum_{k=1}^{q}\sum_{j=1}^{k_q}{f_{j}^{k}(\xi_j^k)})$$ where $\xi_j^k$ is the coefficient of the basis function expansion of $X^k$, (in PCA analysis $\xi_j^k$ is the score of the $j$-functional PC of $X^k$.

The smooth functions $f(\cdot)$ can be added to the right hand side of the formula to specify that the linear predictor depends on smooth functions of predictors using smooth terms s and te as in gam (or linear functionals of these as $Z\beta$ and $\big<X(t),\beta\big>$ in fregre.glm).

Usage

fregre.gsam.vs(data, y, x, alpha, type.basis = "pc", ncomp, kbs,
            #criterio="sp",
           dcor.min=.1, par.model, xydist, trace = FALSE,
           CV = TRUE, ncomp.fix = FALSE, smooth = TRUE)

Arguments

data

List that containing the variables in the model. "df" element is a data.frame with the response and scalar covariates (numeric and factors variables are allowed). Functional covariates of class fdata or fd are introduced in the following items in the data list.

caracter string with the name of the scalar response variable

caracter string vector with the name of the scalar and functional potential covariates.

alpha

alpha value uese to thes the null hypothesis for the test of independence among covariate X and residual e. By default is 0.05.

type.basis

Type of basis used, by default principal component basis "pc".

ncomp

maximum number of basis elements (only used in functional covariates).

kbs

the dimension of the basis used to represent the smooth term. The default depends on the number of variables that the smooth is a function of.

dcor.min

lower threshold for the variable X to be considered. X is discarded if the distance correlation $R(X,e)< dcor.min$ (e is the residual).

par.model

Model parameters.

xydist

list with the matrices of distances of each variable (all potential covariates and the response) with itself.

trace

Interactive Tracing and Debugging of Call.

TRUE, Cross-validation (CV) is done.

ncomp.fix

if TRUE, a number of basis element is fixed in ncomp. If FALSE, the funcion selects the number of PC components between the ncomp.

smooth

if TRUE, a smooth estimate is made for all covariates included in the model (less for factors). The model is adjusted with the estimated variable linearly or smoothly. If the models are equivalent, the model is adjusted with the linearly estimated variable.

Value

Return a list with the follow elements:

model

object corresponding to the estimated additive mdoel using the selected variables. Same output as thefregre.gsam function.

gof

The goodness of fit for each step of VS algorithm.

i.predictor

Vector with 1 if the variable is selected, 0 otherwise.

dcor

the value of distance correlation for each pontential covariate and the residual of the model in each step.

References

Febrero-Bande, M., Gonz\'alez-Manteiga, W. and Oviedo de la Fuente, M. Variable selection in functional additive regression models, (2018). Computational Statistics, 1-19. DOI: https://doi.org/10.1007/s00180-018-0844-5

Examples

Run this code

# NOT RUN {
data(tecator)
x=tecator$absorp.fdata
x1<-fdata.deriv(x)
x2<-fdata.deriv(x,nderiv=2)
y=tecator$y$Fat
xcat0<-cut(rnorm(length(y)),4)
xcat1<-cut(tecator$y$Protein,4)
xcat2<-cut(tecator$y$Water,4)
ind <- 1:129
dat    <- data.frame("Fat"=y, x1$data, xcat1, xcat2)
ldat <- list("df"=dat[ind,],"x"=x[ind,],"x1"=x1[ind,],"x2"=x2[ind,])
# 3 functionals (x,x1,x2), 3 factors (xcat0, xcat1, xcat2)
# and 100 scalars (impact poitns of x1) 

# Time consuming
 res.gam1<-fregre.gsam.vs("Fat",data=ldat)
 summary(res.gam1$model)
 res.gam1$i.predictor

covar <- c("xcat0","xcat1","xcat2","x","x1","x2")
res.gam2<-fregre.gsam.vs(y="Fat", x=covar, data=ldat)
summary(res.gam2$model)
res.gam2$i.predictor

# Prediction like fregre.gsam() 
newldat <- list("df"=dat[-ind,],"x"=x[-ind,],"x1"=x1[-ind,],
                "x2"=x2[-ind,])
pred.gam1<-predict.fregre.gsam(res.gam1$model,newldat)
pred.gam2<-predict.fregre.gsam(res.gam2$model,newldat)
plot(dat[-ind,"Fat"],pred.gam1)
points(dat[-ind,"Fat"],pred.gam2,col=2)
# }

Run the code above in your browser using DataLab