# pffr

##### Penalized function-on-function regression

Implements additive regression for functional and scalar
covariates and functional responses. This function is a
wrapper for `mgcv`

's `gam`

and its
siblings to fit models of the general form
$E(Y_i(t)) = g(\mu(t) + \int X_i(s)\beta(s,t)ds +
f(z_{1i}, t) + f(z_{2i}) + z_{3i} \beta_3(t) + \dots$
with a functional (but not necessarily continuous)
response $Y(t)$, response function $g$,
(optional) smooth intercept $\mu(t)$, (multiple)
functional covariates $X(t)$ and scalar covariates
$z_1$, $z_2$, etc.

##### Usage

```
pffr(formula, yind, data = NULL, ydata = NULL,
algorithm = NA, method = "REML",
tensortype = c("te", "t2"),
bs.yindex = list(bs = "ps", k = 5, m = c(2, 1)),
bs.int = list(bs = "ps", k = 20, m = c(2, 1)), ...)
```

##### Arguments

- formula
- a formula with special terms as for
`gam`

, with additional special terms`ff()`

and`c()`

- yind
- a vector with length equal to the number of
columns of the matrix of functional responses giving the
vector of evaluation points $(t_1, \dots ,t_{G})$.
If not supplied,
`yind`

is`1:ncol(`

.) - algorithm
- the name of the function used to
estimate the model. Defaults to
`gam`

if the matrix of functional responses has less than`2e5`

data points and to`bam`

- data
- an (optional)
`data.frame`

or a named list containing the data. The functional response and functional covariates have to be supplied as n bymatrices, i.e. each row is one functional observation. The model is then - ydata
- an (optional)
`data.frame`

supplying functional responses that are not observed on a regular grid. See Details. - method
- Defaults to
`"REML"`

-estimation, including of unknown scale. See`gam`

for details. - bs.yindex
- a named (!) list giving the parameters
for spline bases on the index of the functional response.
Defaults to
`list(bs="ps", k=5, m=c(2, 1))`

, i.e. 5 cubic B-splines bases with first order difference penalty. - bs.int
- a named (!) list giving the parameters for
the spline basis for the global functional intercept.
Defaults to
`list(bs="ps", k=20, m=c(2, 1))`

, i.e. 20 cubic B-splines bases with first order difference penalty. - tensortype
- which typ of tenor product splines to
use. One of "
`te`

" or "`t2`

", defaults to`te`

- ...
- additional arguments that are valid for
`gam`

or`bam`

.`weights, subset, offset`

are not yet implemented!

##### Value

- a fitted
`pffr`

-object, which is a`gam`

-object with some additional information in an`pffr`

-entry. If`algorithm`

is`"gamm"`

or`"gamm4"`

, only the`$gam`

part of the returned list is modified in this way.

##### Details

The routine can estimate

- (nonlinear,
and possibly multivariate) effects of (one or multiple)
scalar covariates$z$that vary smoothly over the
index$t$of$Y(t)$(e.g.$f(z_{1i}, t)$,
specified in the
`formula`

simply as`~s(z1)`

), - (nonlinear) effects of scalar covariates that are
constant over$t$(e.g.$f(z_{2i})$, specified as
`~c(s(z2))`

, or$\beta_2 z_{2i}$, specified as`~c(z2)`

), - linear functional effects of scalar
(numeric or factor) covariates that vary smoothly over$t$(e.g.$z_{3i} \beta_3(t)$, specified as
`~z3`

), - function-on-function regression terms
(e.g.$\int X_i(s)\beta(s,t)ds$, specified as
`~ff(X, yindex=t, xindex=s)`

, see`ff`

).

`c()`

-notation to denote model terms that
are constant over the index of the functional
response.
Internally, univariate smooth terms without a
`c()`

-wrapper are expanded into bivariate smooth
terms in the original covariate and the index of the
functional response. Bivariate smooth terms (```
s(),
te()
```

or `t2()`

) without a `c()`

-wrapper are
expanded into trivariate smooth terms in the original
covariates and the index of the functional response.
Linear terms for scalar covariates or categorical
covariates are expanded into varying coefficient terms,
varying smoothly over the index of the functional
response. For factor variables, a separate smooth
function with its own smoothing parameter is estimated
for each level of the factor.
Functional random
intercepts $B_{0g(i)}(t)$ for a grouping variable
`g`

can be specified via `~s(g, bs="re")`

),
functional random slopes $u_i B_{1g(i)}(t)$ in a
numeric variable `u`

via ```
~s(g, u,
bs="re")
```

).
The marginal spline basis used for the
index of the the functional response is specified via the
*global*argument

`bs.yindex`

. If necessary,
this can be overriden for any specific term by supplying
a `bs.yindex`

-argument to that term in the formula,
e.g. `~s(x, bs.yindex=list(bs="tp", k=7))`

would
yield a tensor product spline for which the marginal
basis for the index of the response are 7 cubic
thin-plate spline functions overriding the global default
for the basis and penalty on the index of the response
given by the *global*

`bs.yindex`

-argument .
Use `~-1 + c(1) + ...`

to specify a model with only
a constant and no functional intercept.
The functional covariates have to be supplied as a
$n$ by `data`

,
which can contain missing values. If the functional
responses are sparse or irregular (i.e., not evaluated on
the same evaluation points across all observations), the
`ydata`

-argument can be used to specify the
responses: `ydata`

must be a `data.frame`

with
3 columns called `'.obs', '.index', '.value'`

which
specify which curve the point belongs to
(`'.obs'`

=$i$), at which $t$ it was observed
(`'.index'`

=$t$), and the observed value
(`'.value'`

=$Y_i(t)$). For both regular and
irregular functional responses, the model is then fitted
with the data in long format, i.e., for data on a grid
the rows of the matrix of the functional response
evaluations $Y_i(t)$ are stacked into one long vector
and the covariates are expanded/repeated correspondingly.
This means the models get quite big fairly fast, since
the effective number of rows in the design matrix is
number of observations times number of evaluations of
$Y(t)$ per observation.
Note that `pffr`

overrides the default
identifiability constraints ($\sum_{i,t} \hat f(z_i,
x_i, t) = 0$) implemented in `mgcv`

for tensor
product terms whose marginals include the index $t$
of the functional response. Instead, $\sum_i \hat
f(z_i, x_i, t) = 0$ for all $t$ is enforced, so that
effects varying over $t$ can be interpreted as local
deviations from the global functional intercept. We
recommend using centered scalar covariates for terms like
$z \beta(t)$ (`~z`

) and centered functional
covariates with $\sum_i X_i(t) = 0$ for all $t$
in `ff`

-terms so that the global functional
intercept can be interpreted as the global mean function.
For irregular or sparse $Y_i(t)$ supplied via
`ydata`

it is not trivial to specify these custom
constraints. They are enforced on the synthetic grid of
$t$-values returned in the `pffr$yind`

entry of
the return object.##### References

Ivanescu, A., Staicu, A.-M., Scheipl, F. and Greven, S.
(2012). Penalized function-on-function regression.
(under revision)

Scheipl, F., Staicu, A.-M. and Greven, S. (2012).
Functional Additive Mixed Models. (submitted)

##### See Also

`smooth.terms`

for details of
`mgcv`

syntax and available spline bases and
penalties.

##### Examples

```
###############################################################################
# univariate model:
# Y(t) = f(t) + \int X1(s)\beta(s,t)ds + eps
data1 <- pffrSim(scenario="1")
t <- attr(data1, "yindex")
s <- attr(data1, "xindex")
m1 <- pffr(Y ~ ff(X1, xind=s), yind=t, data=data1)
summary(m1)
plot(m1, pers=TRUE)
###############################################################################
# multivariate model:
# Y(t) = f0(t) + \int X1(s)\beta1(s,t)ds + \int X2(s)\beta2(s,t)ds +
# xlin \beta3(t) + f1(xte1, xte2) + f2(xsmoo, t) + beta4 xconst + eps
data2 <- pffrSim(scenario="2", n=200)
t <- attr(data2, "yindex")
s <- attr(data2, "xindex")
m2 <- pffr(Y ~ ff(X1, xind=s) + #linear function-on-function
ff(X2, xind=s) + #linear function-on-function
xlin + #varying coefficient term
c(te(xte1, xte2)) + #bivariate smooth term in xte1 & xte2, const. over Y-index
s(xsmoo) + #smooth effect of xsmoo varying over Y-index
c(xconst), # linear effect of xconst constant over Y-index
yind=t,
data=data2)
summary(m2)
plot(m2, pers=TRUE)
#############################################################################
# sparse data (keep only 20% of function evaluations):
set.seed(121456)
ydata1 <- data.frame(cbind(.obs=as.vector(row(data1$Y)),
.index=rep(attr(data1, "yindex"), e=nrow(data1)),
.value=as.vector(data1$Y)))
ydata1 <- ydata1[sample(1:nrow(ydata), 0.2*nrow(ydata)), ]
s <- attr(data1, "xindex")
t <- attr(data1, "yindex")
m1.sparse <- pffr(Y ~ ff(X1, xind=s), data=data1, ydata=ydata1, yind=t)
summary(m1.sparse)
plot(m1.sparse, pers=TRUE)
```

*Documentation reproduced from package refund, version 0.1-9, License: GPL (>= 2)*