ff: Construct a function-on-function regression term

Description

Defines a term \(\int^{s_{hi, i}}_{s_{lo, i}} X_i(s)\beta(t,s)ds\) for inclusion in an mgcv::gam-formula (or bam or gamm or gamm4:::gamm4) as constructed by pffr. Defaults to a cubic tensor product B-spline with marginal first order differences penalties for \(\beta(t,s)\) and numerical integration over the entire range \([s_{lo, i}, s_{hi, i}] = [\min(s_i), \max(s_i)]\) by using Simpson weights. Can't deal with any missing \(X(s)\), unequal lengths of \(X_i(s)\) not (yet?) possible. Unequal integration ranges for different \(X_i(s)\) should work. \(X_i(s)\) is assumed to be numeric (duh...).

Usage

ff(
  X,
  yind = NULL,
  xind = seq(0, 1, l = ncol(X)),
  basistype = c("te", "t2", "ti", "s", "tes"),
  integration = c("simpson", "trapezoidal", "riemann"),
  L = NULL,
  limits = NULL,
  splinepars = if (basistype != "s") {     list(bs = "ps", m = list(c(2, 1), c(2, 1)),
    k = c(5, 5)) } else {     list(bs = "tp", m = NA) },
  check.ident = TRUE
)

Arguments

an n by ncol(xind) matrix of function evaluations \(X_i(s_{i1}),\dots, X_i(s_{iS})\); \(i=1,\dots,n\).

yind

DEPRECATED used to supply matrix (or vector) of indices of evaluations of \(Y_i(t)\), no longer used.

xind

vector of indices of evaluations of \(X_i(s)\), i.e, \((s_{1},\dots,s_{S})\)

basistype

defaults to "te", i.e. a tensor product spline to represent \(\beta(t,s)\). Alternatively, use "s" for bivariate basis functions (see mgcv's s) or "t2" for an alternative parameterization of tensor product splines (see mgcv's t2).

integration

method used for numerical integration. Defaults to "simpson"'s rule for calculating entries in L. Alternatively and for non-equidistant grids, "trapezoidal" or "riemann". "riemann" integration is always used if limits is specified

optional: an n by ncol(xind) matrix giving the weights for the numerical integration over \(s\).

limits

defaults to NULL for integration across the entire range of \(X(s)\), otherwise specifies the integration limits \(s_{hi}(t), s_{lo}(t)\): either one of "s<t" or "s<=t" for \((s_{hi}(t), s_{lo}(t)) = (t, 0]\) or \([t, 0]\), respectively, or a function that takes s as the first and t as the second argument and returns TRUE for combinations of values (s,t) if s falls into the integration range for the given t. This is an experimental feature and not well tested yet; use at your own risk.

splinepars

optional arguments supplied to the basistype-term. Defaults to a cubic tensor product B-spline with marginal first difference penalties, i.e. list(bs="ps", m=list(c(2, 1), c(2,1))). See te or s in mgcv for details

check.ident

check identifiability of the model spec. See Details and References. Defaults to TRUE.

Value

A list containing

call

a "call" to te (or s or t2) using the appropriately constructed covariate and weight matrices

data

a list containing the necessary covariate and weight matrices

Details

If check.ident==TRUE and basistype!="s" (the default), the routine checks conditions for non-identifiability of the effect. This occurs if a) the marginal basis for the functional covariate is rank-deficient (typically because the functional covariate has lower rank than the spline basis along its index) and simultaneously b) the kernel of Cov\((X(s))\) is not disjunct from the kernel of the marginal penalty over s. In practice, a) occurs quite frequently, and b) occurs usually because curve-wise mean centering has removed all constant components from the functional covariate. If there is kernel overlap, \(\beta(t,s)\) is constrained to be orthogonal to functions in that overlap space (e.g., if the overlap contains constant functions, constraints "\(\int \beta(t,s) ds = 0\) for all t" are enforced). See reference for details. A warning is always given if the effective rank of Cov\((X(s))\) (defined as the number of eigenvalues accounting for at least 0.995 of the total variance in \(X_i(s)\)) is lower than 4. If \(X_i(s)\) is of very low rank, ffpc-term may be preferable.

References