Implements additive regression for functional and scalar covariates and
functional responses. This function is a wrapper for `mgcv`

's
`gam`

and its siblings to fit models of the general form
\(E(Y_i(t)) = g(\mu(t) + \int X_i(s)\beta(s,t)ds + f(z_{1i}, t) +
f(z_{2i}) + z_{3i} \beta_3(t) + \dots )\) with a functional (but not
necessarily continuous) response \(Y(t)\), response function \(g\),
(optional) smooth intercept \(\mu(t)\), (multiple) functional covariates
\(X(t)\) and scalar covariates \(z_1\), \(z_2\), etc.

```
pffr(
formula,
yind,
data = NULL,
ydata = NULL,
algorithm = NA,
method = "REML",
tensortype = c("ti", "t2"),
bs.yindex = list(bs = "ps", k = 5, m = c(2, 1)),
bs.int = list(bs = "ps", k = 20, m = c(2, 1)),
...
)
```

formula

yind

a vector with length equal to the number of columns of the matrix
of functional responses giving the vector of evaluation points \((t_1,
\dots ,t_{G})\). If not supplied, `yind`

is set to
`1:ncol(<response>)`

.

data

an (optional) `data.frame`

containing the data. Can also be
a named list for regular data. Functional covariates have to be supplied as
<no. of observations> by <no. of evaluations> matrices, i.e. each row is
one functional observation.

ydata

an (optional) `data.frame`

supplying functional responses
that are not observed on a regular grid. See Details.

algorithm

method

tensortype

bs.yindex

a named (!) list giving the parameters for spline bases on
the index of the functional response. Defaults to ```
list(bs="ps", k=5,
m=c(2, 1))
```

, i.e. 5 cubic B-splines bases with first order difference
penalty.

bs.int

a named (!) list giving the parameters for the spline basis for
the global functional intercept. Defaults to ```
list(bs="ps", k=20,
m=c(2, 1))
```

, i.e. 20 cubic B-splines bases with first order difference
penalty.

A fitted `pffr`

-object, which is a
`gam`

-object with some additional information in an
`pffr`

-entry. If `algorithm`

is `"gamm"`

or `"gamm4"`

,
only the `$gam`

part of the returned list is modified in this way.
Available methods/functions to postprocess fitted models:
`summary.pffr`

, `plot.pffr`

,
`coef.pffr`

, `fitted.pffr`

,
`residuals.pffr`

, `predict.pffr`

,
`model.matrix.pffr`

, `qq.pffr`

,
`pffr.check`

. If `algorithm`

is `"jagam"`

, only
the location of the model file and the usual
`jagam`

-object are returned, you have to run the sampler
yourself.

The routine can estimate

linear functional effects of scalar (numeric or factor) covariates that vary smoothly over \(t\) (e.g. \(z_{1i} \beta_1(t)\), specified as

`~z1`

),nonlinear, and possibly multivariate functional effects of (one or multiple) scalar covariates \(z\) that vary smoothly over the index \(t\) of \(Y(t)\) (e.g. \(f(z_{2i}, t)\), specified in the

`formula`

simply as`~s(z2)`

)(nonlinear) effects of scalar covariates that are constant over \(t\) (e.g. \(f(z_{3i})\), specified as

`~c(s(z3))`

, or \(\beta_3 z_{3i}\), specified as`~c(z3)`

),function-on-function regression terms (e.g. \(\int X_i(s)\beta(s,t)ds\), specified as

`~ff(X, yindex=t, xindex=s)`

, see`ff`

). Terms given by`sff`

and`ffpc`

provide nonlinear and FPC-based effects of functional covariates, respectively.concurrent effects of functional covariates

`X`

measured on the same grid as the response are specified as follows:`~s(x)`

for a smooth, index-varying effect \(f(X(t),t)\),`~x`

for a linear index-varying effect \(X(t)\beta(t)\),`~c(s(x))`

for a constant nonlinear effect \(f(X(t))\),`~c(x)`

for a constant linear effect \(X(t)\beta\).Smooth functional random intercepts \(b_{0g(i)}(t)\) for a grouping variable

`g`

with levels \(g(i)\) can be specified via`~s(g, bs="re")`

), functional random slopes \(u_i b_{1g(i)}(t)\) in a numeric variable`u`

via`~s(g, u, bs="re")`

). Scheipl, Staicu, Greven (2013) contains code examples for modeling correlated functional random intercepts using`mrf`

-terms.

Use the `c()`

-notation to denote
model terms that are constant over the index of the functional response.

Internally, univariate smooth terms without a `c()`

-wrapper are
expanded into bivariate smooth terms in the original covariate and the
index of the functional response. Bivariate smooth terms (`s(), te()`

or `t2()`

) without a `c()`

-wrapper are expanded into trivariate
smooth terms in the original covariates and the index of the functional
response. Linear terms for scalar covariates or categorical covariates are
expanded into varying coefficient terms, varying smoothly over the index of
the functional response. For factor variables, a separate smooth function
with its own smoothing parameter is estimated for each level of the
factor. The marginal spline basis used for the index of the the
functional response is specified via the *global* argument
`bs.yindex`

. If necessary, this can be overriden for any specific term
by supplying a `bs.yindex`

-argument to that term in the formula, e.g.
`~s(x, bs.yindex=list(bs="tp", k=7))`

would yield a tensor product
spline over `x`

and the index of the response in which the marginal
basis for the index of the response are 7 cubic thin-plate spline functions
(overriding the global default for the basis and penalty on the index of
the response given by the *global* `bs.yindex`

-argument). Use
`~-1 + c(1) + ...`

to specify a model with only a constant and no
functional intercept.

The functional covariates have to be supplied as a \(n\) by <no. of
evaluations> matrices, i.e. each row is one functional observation. For
data on a regular grid, the functional response is supplied in the same
format, i.e. as a matrix-valued entry in `data`

, which can contain
missing values.

If the functional responses are *sparse or irregular* (i.e., not
evaluated on the same evaluation points across all observations), the
`ydata`

-argument can be used to specify the responses: `ydata`

must be a `data.frame`

with 3 columns called ```
'.obs', '.index',
'.value'
```

which specify which curve the point belongs to
(`'.obs'`

=\(i\)), at which \(t\) it was observed
(`'.index'`

=\(t\)), and the observed value
(`'.value'`

=\(Y_i(t)\)). Note that the vector of unique sorted
entries in `ydata$.obs`

must be equal to `rownames(data)`

to
ensure the correct association of entries in `ydata`

to the
corresponding rows of `data`

. For both regular and irregular
functional responses, the model is then fitted with the data in long
format, i.e., for data on a grid the rows of the matrix of the functional
response evaluations \(Y_i(t)\) are stacked into one long vector and the
covariates are expanded/repeated correspondingly. This means the models get
quite big fairly fast, since the effective number of rows in the design
matrix is number of observations times number of evaluations of \(Y(t)\)
per observation.

Note that `pffr`

does not use `mgcv`

's default identifiability
constraints (i.e., \(\sum_{i,t} \hat f(z_i, x_i, t) = 0\) or
\(\sum_{i,t} \hat f(x_i, t) = 0\)) for tensor product terms whose
marginals include the index \(t\) of the functional response. Instead,
\(\sum_i \hat f(z_i, x_i, t) = 0\) for all \(t\) is enforced, so that
effects varying over \(t\) can be interpreted as local deviations from
the global functional intercept. This is achieved by using
`ti`

-terms with a suitably modified `mc`

-argument.
Note that this is not possible if `algorithm='gamm4'`

since only
`t2`

-type terms can then be used and these modified constraints are
not available for `t2`

. We recommend using centered scalar covariates
for terms like \(z \beta(t)\) (`~z`

) and centered functional
covariates with \(\sum_i X_i(t) = 0\) for all \(t\) in `ff`

-terms
so that the global functional intercept can be interpreted as the global
mean function.

The `family`

-argument can be used to specify all of the response
distributions and link functions described in
`family.mgcv`

. Note that `family = "gaulss"`

is
treated in a special way: Users can supply the formula for the variance by
supplying a special argument `varformula`

, but this is not modified in
the way that the `formula`

-argument is but handed over to the fitter
directly, so this is for expert use only. If `varformula`

is not
given, `pffr`

will use the parameters from argument `bs.int`

to
define a spline basis along the index of the response, i.e., a smooth
variance function over $t$ for responses $Y(t)$.

Ivanescu, A., Staicu, A.-M., Scheipl, F. and Greven, S. (2015). Penalized function-on-function regression. Computational Statistics, 30(2):539--568. https://biostats.bepress.com/jhubiostat/paper254/

Scheipl, F., Staicu, A.-M. and Greven, S. (2015). Functional Additive Mixed Models. Journal of Computational & Graphical Statistics, 24(2): 477--501. https://arxiv.org/abs/1207.5947

F. Scheipl, J. Gertheiss, S. Greven (2016): Generalized Functional Additive Mixed Models, Electronic Journal of Statistics, 10(1), 1455--1492. https://projecteuclid.org/euclid.ejs/1464710238/

`smooth.terms`

for details of `mgcv`

syntax
and available spline bases and penalties.

# NOT RUN { ############################################################################### # univariate model: # Y(t) = f(t) + \int X1(s)\beta(s,t)ds + eps set.seed(2121) data1 <- pffrSim(scenario="ff", n=40) t <- attr(data1, "yindex") s <- attr(data1, "xindex") m1 <- pffr(Y ~ ff(X1, xind=s), yind=t, data=data1) summary(m1) plot(m1, pers=TRUE, pages=1) # } # NOT RUN { ############################################################################### # multivariate model: # E(Y(t)) = \beta_0(t) + \int X1(s)\beta_1(s,t)ds + xlin \beta_3(t) + # f_1(xte1, xte2) + f_2(xsmoo, t) + \beta_4 xconst data2 <- pffrSim(scenario="all", n=200) t <- attr(data2, "yindex") s <- attr(data2, "xindex") m2 <- pffr(Y ~ ff(X1, xind=s) + #linear function-on-function xlin + #varying coefficient term c(te(xte1, xte2)) + #bivariate smooth term in xte1 & xte2, const. over Y-index s(xsmoo) + #smooth effect of xsmoo varying over Y-index c(xconst), # linear effect of xconst constant over Y-index yind=t, data=data2) summary(m2) plot(m2, pers=TRUE) str(coef(m2)) # convenience functions: preddata <- pffrSim(scenario="all", n=20) str(predict(m2, newdata=preddata)) str(predict(m2, type="terms")) cm2 <- coef(m2) cm2$pterms str(cm2$smterms, 2) str(cm2$smterms[["s(xsmoo)"]]$coef) ############################################################################# # sparse data (80% missing on a regular grid): set.seed(88182004) data3 <- pffrSim(scenario=c("int", "smoo"), n=100, propmissing=0.8) t <- attr(data3, "yindex") m3.sparse <- pffr(Y ~ s(xsmoo), data=data3$data, ydata=data3$ydata, yind=t) summary(m3.sparse) plot(m3.sparse, pers=TRUE, pages=1) # }