mgcv
's gam
and its
siblings to fit models of the general form
$E(Y_i(t)) = g(\mu(t) + \int X_i(s)\beta(s,t)ds +
f(z_{1i}, t) + f(z_{2i}) + z_{3i} \beta_3(t) + \dots )$
with a functional (but not necessarily continuous) response
$Y(t)$, response function $g$, (optional) smooth
intercept $\mu(t)$, (multiple) functional covariates
$X(t)$ and scalar covariates $z_1$, $z_2$, etc.pffr(formula, yind, data = NULL, ydata = NULL, algorithm = NA,
method = "REML", tensortype = c("ti", "t2"), bs.yindex = list(bs = "ps",
k = 5, m = c(2, 1)), bs.int = list(bs = "ps", k = 20, m = c(2, 1)), ...)
yind
is set to
1:ncol()
.data.frame
or a named
list containing the data. Functional covariates have to
be supplied as n by data.frame
supplying
functional responses that are not observed on a regular
grid. See Details."REML"
-estimation,
including of unknown scale. See gam
for details.list(bs="ps", k=5, m=c(2, 1))
, i.e. 5
cubic B-splines bases with first order difference
penalty.list(bs="ps", k=20, m=c(2, 1))
, i.e.
20 cubic B-splines bases with first order difference
penalty.pffr
-object, which is a
gam
-object with some additional
information in an pffr
-entry. If algorithm
is
"gamm"
or "gamm4"
, only the $gam
part
of the returned list is modified in this way.~z1
),formula
simply as~s(z2)
)~c(s(z3))
, or$\beta_3 z_{3i}$, specified as~c(z3)
),~ff(X, yindex=t,
xindex=s)
, seeff
). Terms given bysff
andffpc
provide
nonlinear and FPC-based effects of functional covariates,
respectively.X
measured on the same grid as the
response are specified as follows:~s(x)
for a
smooth, index-varying effect$f(X(t),t)$,~x
for a linear index-varying effect$X(t)\beta(t)$,~c(s(x))
for a constant nonlinear effect$f(X(t))$,~c(x)
for a constant linear effect$X(t)\beta$.g
with levels$g(i)$can be specified via~s(g, bs="re")
), functional random slopes$u_i
b_{1g(i)}(t)$in a numeric variableu
via~s(g, u, bs="re")
). Scheipl, Staicu, Greven
(2013) contains code examples for modeling correlated
functional random intercepts usingmrf
-terms.c()
-notation to denote model terms that are
constant over the index of the functional response.
Internally, univariate smooth terms without a
c()
-wrapper are expanded into bivariate smooth
terms in the original covariate and the index of the
functional response. Bivariate smooth terms (s(),
te()
or t2()
) without a c()
-wrapper are
expanded into trivariate smooth terms in the original
covariates and the index of the functional response.
Linear terms for scalar covariates or categorical
covariates are expanded into varying coefficient terms,
varying smoothly over the index of the functional
response. For factor variables, a separate smooth
function with its own smoothing parameter is estimated
for each level of the factor.
The marginal spline
basis used for the index of the the functional response
is specified via the global argument
bs.yindex
. If necessary, this can be overriden for
any specific term by supplying a
bs.yindex
-argument to that term in the formula,
e.g. ~s(x, bs.yindex=list(bs="tp", k=7))
would
yield a tensor product spline for which the marginal
basis for the index of the response are 7 cubic
thin-plate spline functions overriding the global default
for the basis and penalty on the index of the response
given by the global bs.yindex
-argument .
Use ~-1 + c(1) + ...
to specify a model with only
a constant and no functional intercept.
The functional covariates have to be supplied as a
$n$ by data
,
which can contain missing values. If the functional
responses are sparse or irregular (i.e., not evaluated on
the same evaluation points across all observations), the
ydata
-argument can be used to specify the
responses: ydata
must be a data.frame
with
3 columns called '.obs', '.index', '.value'
which
specify which curve the point belongs to
('.obs'
=$i$), at which $t$ it was observed
('.index'
=$t$), and the observed value
('.value'
=$Y_i(t)$). Note that the vector of
unique sorted entries in ydata$.obs
must be equal
to rownames(data)
to ensure the correct
association of entries in ydata
to the
corresponding rows of data
. For both regular and
irregular functional responses, the model is then fitted
with the data in long format, i.e., for data on a grid
the rows of the matrix of the functional response
evaluations $Y_i(t)$ are stacked into one long vector
and the covariates are expanded/repeated correspondingly.
This means the models get quite big fairly fast, since
the effective number of rows in the design matrix is
number of observations times number of evaluations of
$Y(t)$ per observation.
Note that pffr
does not use mgcv
's default
identifiability constraints ($\sum_{i,t} \hat f(z_i,
x_i, t) = 0$ or $\sum_{i,t} \hat f(x_i, t) = 0$ for
tensor product terms whose marginals include the index
$t$ of the functional response. Instead, $\sum_i
\hat f(z_i, x_i, t) = 0$ for all $t$ is enforced, so
that effects varying over $t$ can be interpreted as
local deviations from the global functional intercept.
This is achieved by using ti
-terms
with a suitably modified mc
-argument. Note that
this is not possible if algorithm='gamm4'
since
only t2
-type terms can then be used and these
modified constraints are not available for t2
. We
recommend using centered scalar covariates for terms like
$z \beta(t)$ (~z
) and centered functional
covariates with $\sum_i X_i(t) = 0$ for all $t$
in ff
-terms so that the global functional
intercept can be interpreted as the global mean function.Scheipl, F., Staicu, A.-M. and Greven, S. (2013).
Functional Additive Mixed Models. (under revision)
smooth.terms
for details of mgcv
syntax and available spline bases and penalties.###############################################################################
# univariate model:
# Y(t) = f(t) + \int X1(s)\beta(s,t)ds + eps
set.seed(2121)
data1 <- pffrSim(scenario="ff", n=40)
t <- attr(data1, "yindex")
s <- attr(data1, "xindex")
m1 <- pffr(Y ~ ff(X1, xind=s) + ff(X2, xind=s), yind=t, data=data1)
summary(m1)
plot(m1, pers=TRUE, pages=1)
###############################################################################
# multivariate model:
# Y(t) = f0(t) + \int X1(s)\beta1(s,t)ds + \int X2(s)\beta2(s,t)ds +
# xlin \beta3(t) + f1(xte1, xte2) + f2(xsmoo, t) + beta4 xconst + eps
data2 <- pffrSim(scenario="all", n=200)
t <- attr(data2, "yindex")
s <- attr(data2, "xindex")
m2 <- pffr(Y ~ ff(X1, xind=s) + #linear function-on-function
ff(X2, xind=s) + #linear function-on-function
xlin + #varying coefficient term
c(te(xte1, xte2)) + #bivariate smooth term in xte1 & xte2, const. over Y-index
s(xsmoo) + #smooth effect of xsmoo varying over Y-index
c(xconst), # linear effect of xconst constant over Y-index
yind=t,
data=data2)
summary(m2)
plot(m2, pers=TRUE)
str(coef(m2))
# convenience functions:
preddata <- pffrSim(scenario="all", n=20)
str(predict(m2, newdata=preddata))
str(predict(m2, type="terms"))
cm2 <- coef(m2)
cm2$pterms
str(cm2$smterms, 2)
str(cm2$smterms[["s(xsmoo)"]]$coef)
#############################################################################
# sparse data (80% missing on a regular grid):
set.seed(88182004)
data3 <- pffrSim(scenario=c("int", "smoo"), n=100, propmissing=0.8)
t <- attr(data3, "yindex")
m3.sparse <- pffr(Y ~ s(xsmoo), data=data3$data, ydata=data3$ydata, yind=t)
summary(m3.sparse)
plot(m3.sparse, pers=TRUE, pages=1)
Run the code above in your browser using DataLab