flm.test: Goodness-of-fit test for the Functional Linear Model with scalar response

Description

The function flm.test tests the composite null hypothesis of a Functional Linear Model with scalar response (FLM), $$H_0:\,Y=\big+\epsilon,$$ versus a general alternative. If $\beta=\beta_0$ is provided, then the simple hypothesis $H_0:\,Y=\big+\epsilon$ is tested. The way of testing the null hypothesis is via a Projected Cramer-von Mises test (see Details).

Usage

flm.test (X.fdata, Y, beta0.fdata = NULL, B = 5000, est.method = "pls",
          p = NULL, type.basis = "bspline", verbose = TRUE,
          plot.it = TRUE, B.plot = 100, G = 200, ...)

Arguments

X.fdata

Functional covariate for the FLM. The object must be in the class fdata.

Scalar response for the FLM. Must be a vector with the same number of elements as functions are in X.fdata.

beta0.fdata

Functional parameter for the simple null hypothesis, in the fdata class. Recall that the argvals and rangeval arguments of beta0.fdata must be the same of X.fda

Number of bootstrap replicates to calibrate the distribution of the test statistic. B=5000 replicates are the recommended for carry out the test, although for exploratory analysis (not inferential), an acceptable less time-consuming op

est.method

Estimation method for the unknown parameter $\beta$, only used in the composite case. Mainly, there are two options: specify the number of basis elements for the estimated $\beta$ by p or optimally select p by a data-driven crite

Number of elements of the basis considered. If it is not given, an optimal p will be chosen using a specific criteria (see est.method and type.basis arguments).

type.basis

Type of basis used to represent the functional process. Depending on the hypothesis it will have a different interpretation:

Simple hypothesis. One of these options:
- "bspline"Ifpis given, the functional process i

Value

An object with class "htest" whose underlying structure is a list containing the following components:
statisticThe value of the test statistic.
boot.statisticsA vector of length B with the values of the bootstrap test statistics.
p.valueThe p-value of the test.
methodThe method used.
BThe number of bootstrap replicates used.
type.basisThe type of basis used.
beta.estThe estimated functional parameter $\beta$ in the composite hypothesis. For the simple hypothesis, the given beta0.fdata.
pThe number of basis elements passed or automatically chosen.
ordThe optimal order for PC and PLS given by fregre.pc.cv andlatex{ } fregre.pls.cv. For other methods is setted to 1:p.
data.nameThe character string "Y=+e"

item

verbose
plot.it
B.plot
G
...

eqn

$R_n(\cdot)$

code

create.basis

Details

The Functional Linear Model with scalar response (FLM), is defined as $Y=\big+\epsilon$, for a functional process $X$ such that $E[X(t)]=0$, $E[X(t)\epsilon]=0$ for all $t$ and for a scalar variable $Y$ such that $E[Y]=0$. Then, the test assumes that Y and X.fdata are centred and will automatically center them. So, bear in mind that when you apply the test for Y and X.fdata, actually, you are applying it to Y-mean(Y) and fdata.cen(X.fdata)$Xcen. The test statistic corresponds to the Cramer-von Mises norm of the Residual Marked empirical Process based on Projections $R_n(u,\gamma)$ defined in Garcia-Portugues et al. (2012). The expression of this process in a $p$-truncated basis of the space $L^2[0,T]$ leads to the $p$-multivariate process $R_{n,p}\big(u,\gamma^{(p)}\big)$, whose Cramer-von Mises norm is easily computed. The choice of an appropriate $p$ to represent the functional process $X$, in case that is not provided, is done via the estimation of $\beta$ for the composite hypothesis. For the simple hypothesis, as no estimation of $\beta$ is done, the choice of $p$ depends only on the functional process $X$. As the result of the test may change for different $p$'s, we recommend to use an automatic criterion to select $p$ instead of provide a fixed one. The distribution of the test statistic is approximated by a wild bootstrap on the residuals, using the golden section bootstrap. Finally, the graph shown if plot.it=TRUE represents the observed trajectory, and the bootstrap trajectories under the null, of the process RMPP integrated on the projections: $$R_n(u)\approx\frac{1}{G}\sum_{g=1}^G R_n(u,\gamma_g),$$ where $\gamma_g$ are simulated as Gaussians processes. This gives a graphical idea of how distant is the observed trajectory from the null hypothesis. For further details see Garcia-Portugues et al. (2012).

References

Escanciano, J. C. (2006). A consistent diagnostic test for regression models using projections. Econometric Theory, 22, 1030-1051. http://dx.doi.org/10.1017/S0266466606060506 Garcia-Portugues, E., Gonzalez-Manteiga, W. and Febrero-Bande, M. (2012). A goodness--of--fit test for the functional linear model with scalar response. http://arxiv.org/abs/1205.6167

Examples

Run this code

## Simulated example ##

X=rproc2fdata(n=100,t=seq(0,1,l=101),sigma="OU")
beta0=fdata(mdata=cos(2*pi*seq(0,1,l=101))-(seq(0,1,l=101)-0.5)^2+
rnorm(101,sd=0.05),argvals=seq(0,1,l=101),rangeval=c(0,1))
Y=inprod.fdata(X,beta0)+rnorm(100,sd=0.1)

dev.new(width=21,height=7)
par(mfrow=c(1,3))
plot(X,main="X")
plot(beta0,main="beta0")
plot(density(Y),main="Density of Y",xlab="Y",ylab="Density")
rug(Y)

# Composite hypothesis: do not reject FLM
#pcvm.sim=flm.test(X,Y,B=50,B.plot=50,G=100,plot.it=TRUE)
#pcvm.sim
# flm.test(X,Y,B=5000)

# Estimated beta
# dev.new()
# plot(pcvm.sim$beta.est)

# Simple hypothesis: do not reject beta=beta0
# fflm.test(X,Y,beta0.fdata=beta0,B=50,B.plot=50,G=100)
# flm.test(X,Y,beta0.fdata=beta0,B=5000) 


## AEMET dataset ##

# data(aemet)

## Remove the 5% of the curves with less depth (i.e. 4 curves)
# dev.new()
# res.FM=depth.FM(aemet$temp,draw=TRUE)
# qu=quantile(res.FM$dep,prob=0.05)
# l=which(res.FM$dep<=qu)
# lines(aemet$temp[l],col=3)
# aemet$df$name[l]

## Data without outliers 
# wind.speed=apply(aemet$wind.speed$data,1,mean)[-l]
# temp=aemet$temp[-l]

## Exploratory analysis: accept the FLM
# pcvm.aemet=flm.test(temp,wind.speed,est.method="pls",B=100,B.plot=50,G=100)
# pcvm.aemet

## Estimated beta
# dev.new()
# plot(pcvm.aemet$beta.est,lwd=2,col=2)

## B=5000 for more precision on calibration of the test: also accept the FLM
# flm.test(temp,wind.speed,est.method="pls",B=5000) 

## Simple hypothesis: rejection of beta0=0? Limiting p-value...
# dat=rep(0,length(temp$argvals))
# flm.test(temp,wind.speed, beta0.fdata=fdata(mdata=dat,argvals=temp$argvals,
# rangeval=temp$rangeval),B=100)
# flm.test(temp,wind.speed, beta0.fdata=fdata(mdata=dat,argvals=temp$argvals,
# rangeval=temp$rangeval),B=5000) 


## Tecator dataset ##

# data(tecator)
# names(tecator)
# absorp=tecator$absorp.fdata
# ind=1:129 # or ind=1:215
# x=absorp[ind,]
# y=tecator$y$Fat[ind]
# tt=absorp[["argvals"]]

## Exploratory analysis for composite hypothesis with automatic choose of p
# pcvm.tecat=flm.test(x,y,B=100,B.plot=50,G=100)
# pcvm.tecat

## B=5000 for more precision on calibration of the test: also reject the FLM
# flm.test(x,y,B=5000) 

## Plot of the estimated functional parameters
# plot(pcvm.tecat$beta.est,lwd=2,col=2)
# for(i in 1:100) lines(pcvm.tecat$boot.beta.est[[i]])
# lines(pcvm.tecat$beta.est,lwd=2,col=2)
# legend("topright",legend=c("Estimated","Bootstrap"),col=1:2,lwd=2)

## Distribution of the PCvM statistic
# plot(density(pcvm.tecat$boot.statistics),lwd=2,xlim=c(0,10),
# main="PCvM distribution", xlab="PCvM*",ylab="Density")
# rug(pcvm.tecat$boot.statistics)
# abline(v=pcvm.tecat$statistic,col=2,lwd=2)
# legend("top",legend=c("PCvM observed"),lwd=2,col=2)

## Simple hypothesis: fixed p
# dat=rep(0,length(x$argvals))
# flm.test(x,y,beta0.fdata=fdata(mdata=dat,argvals=x$argvals,
# rangeval=x$rangeval),B=100,p=11)

## Simple hypothesis, automatic choose of p
# flm.test(x,y,beta0.fdata=fdata(mdata=dat,argvals=x$argvals,
# rangeval=x$rangeval),B=100)
# flm.test(x,y,beta0.fdata=fdata(mdata=dat,argvals=x$argvals,
# rangeval=x$rangeval),B=5000)

Run the code above in your browser using DataLab