The function `flm.test`

tests the composite null hypothesis of
a Functional Linear Model with scalar response (FLM),
$$H_0:\,Y=\big<X,\beta\big>+\epsilon,$$ versus
a general alternative. If \(\beta=\beta_0\) is provided, then the
simple hypothesis \(H_0:\,Y=\big<X,\beta_0\big>+\epsilon\) is tested.
The testing of the null hypothesis is done by a Projected Cramer-von Mises statistic (see Details).

```
flm.test(
X.fdata,
Y,
beta0.fdata = NULL,
B = 5000,
est.method = "pls",
p = NULL,
type.basis = "bspline",
verbose = TRUE,
plot.it = TRUE,
B.plot = 100,
G = 200,
...
)
```

X.fdata

Functional covariate for the FLM. The object must be in the class
`fdata`

.

Y

Scalar response for the FLM. Must be a vector with the same number of elements
as functions are in `X.fdata`

.

beta0.fdata

Functional parameter for the simple null hypothesis, in the `fdata`

class.
Recall that the `argvals`

and `rangeval`

arguments of `beta0.fdata`

must be the same
of `X.fdata`

. A possibility to do this is to consider, for example for \(\beta_0=0\)
(the simple null hypothesis of no interaction),
`beta0.fdata=fdata(mdata=rep(0,length(X.fdata$argvals)),`

`argvals=X.fdata$argvals,rangeval=X.fdata$rangeval)`

.
If `beta0.fdata=NULL`

(default), the function will test for the composite null hypothesis.

B

Number of bootstrap replicates to calibrate the distribution of the test statistic.
`B=5000`

replicates are the recommended for carry out the test, although for exploratory analysis
(**not inferential**), an acceptable less time-consuming option is `B=500`

.

est.method

Estimation method for the unknown parameter \(\beta\),
only used in the composite case. Mainly, there are two options: specify the number of basis
elements for the estimated \(\beta\) by `p`

or optimally select `p`

by a
data-driven criteria (see Details section for discussion). Then, it must be one of the following
methods:

`"pc"`

If`p`

, the number of basis elements, is given, then \(\beta\) is estimated by`fregre.pc`

. Otherwise, an optimum`p`

is chosen using`fregre.pc.cv`

and the`"SICc"`

criteria.`"pls"`

If`p`

is given, \(\beta\) is estimated by`fregre.pls`

. Otherwise, an optimum`p`

is chosen using`fregre.pls.cv`

and the`"SICc"`

criteria. This is the default argument as it has been checked empirically that provides a good balance between the performance of the test and the estimation of \(\beta\).`"basis"`

If`p`

is given, \(\beta\) is estimated by`fregre.basis`

. Otherwise, an optimum`p`

is chosen using`fregre.basis.cv`

and the`"GCV.S"`

criteria. In these functions, the same basis for the arguments`basis.x`

and`basis.b`

is considered. The type of basis used will be the given by the argument`type.basis`

and must be one of the class of`create.basis`

. Further arguments passed to`create.basis`

(not`rangeval`

that is taken as the`rangeval`

of`X.fdata`

), can be passed throughout`…`

.

p

Number of elements of the basis considered. If it is not given, an optimal `p`

will be chosen using a specific criteria (see `est.method`

and `type.basis`

arguments).

type.basis

Type of basis used to represent the functional process. Depending on the hypothesis it will have a different interpretation:

Simple hypothesis. One of these options:

`"bspline"`

If`p`

is given, the functional process is expressed in a basis of`p`

B-splines. If not, an optimal`p`

will be chosen by`optim.basis`

, using the`"GCV.S"`

criteria.`"fourier"`

If`p`

is given, the functional process is expressed in a basis of`p`

fourier functions. If not, an optimal`p`

will be chosen by`optim.basis`

, using the`"GCV.S"`

criteria.`"pc"`

`p`

must be given. Expresses the functional process in a basis of`p`

PC.`"pls"`

`p`

must be given. Expresses the functional process in a basis of`p`

PLS.

Although other of the basis supported by

`create.basis`

are possible too,`"bspline"`

and`"fourier"`

are recommended. Other basis may cause incompatibilities.Composite hypothesis. This argument is only used when

`est.method="basis"`

and, in this case, claims for the type of basis used in the basis estimation method of the functional parameter. Again, basis`"bspline"`

and`"fourier"`

are recommended, as other basis may cause incompatibilities.

verbose

Either to show or not information about computing progress.

plot.it

Either to show or not a graph of the observed trajectory,
and the bootstrap trajectories under the null composite hypothesis, of the
process \(R_n(\cdot)\) (see Details). Note that if `plot.it=TRUE`

,
the function takes more time to run.

B.plot

Number of bootstrap trajectories to show in the resulting plot of the test.
As the trajectories shown are the first `B.plot`

of `B`

, `B.plot`

must be
lower or equal to `B`

.

G

Number of projections used to compute the trajectories of the process \(R_n(\cdot)\) by Monte Carlo.

…

Further arguments passed to `create.basis`

.

An object with class `"htest"`

whose underlying structure is a list containing
the following components:

statistic The value of the test statistic.

boot.statistics A vector of length

`B`

with the values of the bootstrap test statistics.p.value The p-value of the test.

method The method used.

B The number of bootstrap replicates used.

type.basis The type of basis used.

beta.est The estimated functional parameter \(\beta\) in the composite hypothesis. For the simple hypothesis, the given

`beta0.fdata`

.p The number of basis elements passed or automatically chosen.

ord The optimal order for PC and PLS given by

`fregre.pc.cv`

and`fregre.pls.cv`

. For other methods is setted to`1:p`

.data.name The character string "Y=<X,b>+e"

The Functional Linear Model with scalar response (FLM), is defined as
\(Y=\big<X,\beta\big>+\epsilon\), for a functional process
\(X\) such that \(E[X(t)]=0\), \(E[X(t)\epsilon]=0\)
for all \(t\) and for a scalar variable \(Y\) such that \(E[Y]=0\).
Then, the test assumes that `Y`

and `X.fdata`

are **centred** and will automatically
center them. So, bear in mind that when you apply the test for `Y`

and `X.fdata`

,
actually, you are applying it to `Y-mean(Y)`

and `fdata.cen(X.fdata)$Xcen`

.
The test statistic corresponds to the Cramer-von Mises norm of the *Residual Marked
empirical Process based on Projections* \(R_n(u,\gamma)\) defined in
Garcia-Portugues *et al.* (2014).
The expression of this process in a \(p\)-truncated basis of the space \(L^2[0,T]\)
leads to the \(p\)-multivariate process \(R_{n,p}\big(u,\gamma^{(p)}\big)\),
whose Cramer-von Mises norm is computed.
The choice of an appropriate \(p\) to represent the functional process \(X\),
in case that is not provided, is done via the estimation of \(\beta\) for the composite
hypothesis. For the simple hypothesis, as no estimation of \(\beta\) is done, the choice
of \(p\) depends only on the functional process \(X\). As the result of the test may
change for different \(p\)'s, we recommend to use an automatic criterion to select \(p\)
instead of provide a fixed one.
The distribution of the test statistic is approximated by a wild bootstrap resampling on the
residuals, using the *golden section bootstrap*.
Finally, the graph shown if `plot.it=TRUE`

represents the observed trajectory, and the
bootstrap trajectories under the null, of the process RMPP *integrated on the projections*:
$$R_n(u)\approx\frac{1}{G}\sum_{g=1}^G R_n(u,\gamma_g),$$
where \(\gamma_g\) are simulated as Gaussians processes. This gives a graphical idea of
how *distant* is the observed trajectory from the null hypothesis.

Escanciano, J. C. (2006). A consistent diagnostic test for regression models using projections. Econometric Theory, 22, 1030-1051. http://dx.doi.org/10.1017/S0266466606060506

Garcia-Portugues, E., Gonzalez-Manteiga, W. and Febrero-Bande, M. (2014). A goodness--of--fit test for the functional linear model with scalar response. Journal of Computational and Graphical Statistics, 23(3), 761-778. http://dx.doi.org/10.1080/10618600.2013.812519

`Adot`

, `PCvM.statistic`

, `rwild`

,
`flm.Ftest`

, `dfv.test`

,
`fregre.pc`

, `fregre.pls`

, `fregre.basis`

,
`fregre.pc.cv`

, `fregre.pls.cv`

,
`fregre.basis.cv`

, `optim.basis`

,
`create.basis`

# NOT RUN { # Simulated example # X=rproc2fdata(n=100,t=seq(0,1,l=101),sigma="OU") beta0=fdata(mdata=cos(2*pi*seq(0,1,l=101))-(seq(0,1,l=101)-0.5)^2+ rnorm(101,sd=0.05),argvals=seq(0,1,l=101),rangeval=c(0,1)) Y=inprod.fdata(X,beta0)+rnorm(100,sd=0.1) dev.new(width=21,height=7) par(mfrow=c(1,3)) plot(X,main="X") plot(beta0,main="beta0") plot(density(Y),main="Density of Y",xlab="Y",ylab="Density") rug(Y) # } # NOT RUN { # Composite hypothesis: do not reject FLM pcvm.sim=flm.test(X,Y,B=50,B.plot=50,G=100,plot.it=TRUE) pcvm.sim flm.test(X,Y,B=5000) # Estimated beta dev.new() plot(pcvm.sim$beta.est) # Simple hypothesis: do not reject beta=beta0 flm.test(X,Y,beta0.fdata=beta0,B=50,B.plot=50,G=100) flm.test(X,Y,beta0.fdata=beta0,B=5000) # AEMET dataset # data(aemet) # Remove the 5\<!-- % of the curves with less depth (i.e. 4 curves) --> dev.new() res.FM=depth.FM(aemet$temp,draw=TRUE) qu=quantile(res.FM$dep,prob=0.05) l=which(res.FM$dep<=qu) lines(aemet$temp[l],col=3) aemet$df$name[l] # Data without outliers wind.speed=apply(aemet$wind.speed$data,1,mean)[-l] temp=aemet$temp[-l] # Exploratory analysis: accept the FLM pcvm.aemet=flm.test(temp,wind.speed,est.method="pls",B=100,B.plot=50,G=100) pcvm.aemet # Estimated beta dev.new() plot(pcvm.aemet$beta.est,lwd=2,col=2) # B=5000 for more precision on calibration of the test: also accept the FLM flm.test(temp,wind.speed,est.method="pls",B=5000) # Simple hypothesis: rejection of beta0=0? Limiting p-value... dat=rep(0,length(temp$argvals)) flm.test(temp,wind.speed, beta0.fdata=fdata(mdata=dat,argvals=temp$argvals, rangeval=temp$rangeval),B=100) flm.test(temp,wind.speed, beta0.fdata=fdata(mdata=dat,argvals=temp$argvals, rangeval=temp$rangeval),B=5000) # Tecator dataset # data(tecator) names(tecator) absorp=tecator$absorp.fdata ind=1:129 # or ind=1:215 x=absorp[ind,] y=tecator$y$Fat[ind] tt=absorp[["argvals"]] # Exploratory analysis for composite hypothesis with automatic choose of p pcvm.tecat=flm.test(x,y,B=100,B.plot=50,G=100) pcvm.tecat # B=5000 for more precision on calibration of the test: also reject the FLM flm.test(x,y,B=5000) # Distribution of the PCvM statistic plot(density(pcvm.tecat$boot.statistics),lwd=2,xlim=c(0,10), main="PCvM distribution", xlab="PCvM*",ylab="Density") rug(pcvm.tecat$boot.statistics) abline(v=pcvm.tecat$statistic,col=2,lwd=2) legend("top",legend=c("PCvM observed"),lwd=2,col=2) # Simple hypothesis: fixed p dat=rep(0,length(x$argvals)) flm.test(x,y,beta0.fdata=fdata(mdata=dat,argvals=x$argvals, rangeval=x$rangeval),B=100,p=11) # Simple hypothesis, automatic choose of p flm.test(x,y,beta0.fdata=fdata(mdata=dat,argvals=x$argvals, rangeval=x$rangeval),B=100) flm.test(x,y,beta0.fdata=fdata(mdata=dat,argvals=x$argvals, rangeval=x$rangeval),B=5000) # }