af: Construct an FGAM regression term

Description

Defines a term $\int_{T}F(X_i(t),t)dt$ for inclusion in an mgcv::gam-formula (or bam or gamm or gamm4:::gamm) as constructed by pfr, where $F(x,t)$ is an unknown smooth bivariate function and $X_i(t)$ is a functional predictor on the closed interval $T$. See smooth.terms for a list of bivariate basis and penalty options; the default is a tensor product basis with marginal cubic regression splines for estimating $F(x,t)$.

Usage

af(
  X,
  argvals = NULL,
  xind = NULL,
  basistype = c("te", "t2", "s"),
  integration = c("simpson", "trapezoidal", "riemann"),
  L = NULL,
  presmooth = NULL,
  presmooth.opts = NULL,
  Xrange = range(X, na.rm = T),
  Qtransform = FALSE,
  ...
)

Arguments

functional predictors, typically expressed as an N by J matrix, where N is the number of columns and J is the number of evaluation points. May include missing/sparse functions, which are indicated by NA values. Alternatively, can be an object of class "fd"; see fd.

argvals

indices of evaluation of X, i.e. $(t_{i1},.,t_{iJ})$ for subject $i$. May be entered as either a length-J vector, or as an N by J matrix. Indices may be unequally spaced. Entering as a matrix allows for different observations times for each subject. If NULL, defaults to an equally-spaced grid between 0 or 1 (or within X$basis$rangeval if X is a fd object.)

xind

same as argvals. It will not be supported in the next version of refund.

basistype

defaults to "te", i.e. a tensor product spline to represent $F(x,t)$ Alternatively, use "s" for bivariate basis functions (see s) or "t2" for an alternative parameterization of tensor product splines (see t2)

integration

method used for numerical integration. Defaults to "simpson"'s rule for calculating entries in L. Alternatively and for non-equidistant grids, "trapezoidal" or "riemann".

an optional N by ncol(argvals) matrix giving the weights for the numerical integration over t. If present, overrides integration.

presmooth

string indicating the method to be used for preprocessing functional predictor prior to fitting. Options are fpca.sc, fpca.face, fpca.ssvd, fpca.bspline, and fpca.interpolate. Defaults to NULL indicateing no preprocessing. See create.prep.func.

presmooth.opts

list including options passed to preprocessing method create.prep.func.

Xrange

numeric; range to use when specifying the marginal basis for the x-axis. It may be desired to increase this slightly over the default of range(X) if concerned about predicting for future observed curves that take values outside of range(X)

Qtransform

logical; should the functional be transformed using the empirical cdf and applying a quantile transformation on each column of X prior to fitting?

...

optional arguments for basis and penalization to be passed to the function indicated by basistype. These could include, for example, "bs", "k", "m", etc. See te or s for details.

Value

A list with the following entries:

call

a "call" to te (or s, t2) using the appropriately constructed covariate and weight matrices.

argvals

the argvals argument supplied to af

L

the matrix of weights used for the integration

xindname

the name used for the functional predictor variable in the formula used by mgcv

tindname

the name used for argvals variable in the formula used by mgcv

Lname

the name used for the L variable in the formula used by mgcv

presmooth

the presmooth argument supplied to af

Xrange

the Xrange argument supplied to af

prep.func

a function that preprocesses data based on the preprocessing method specified in presmooth. See create.prep.func

References

McLean, M. W., Hooker, G., Staicu, A.-M., Scheipl, F., and Ruppert, D. (2014). Functional generalized additive models. Journal of Computational and Graphical Statistics, 23 (1), pp. 249-269. Available at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3982924/.

Examples

Run this code

# NOT RUN {
data(DTI)
## only consider first visit and cases (no PASAT scores for controls)
DTI1 <- DTI[DTI$visit==1 & DTI$case==1,]
DTI2 <- DTI1[complete.cases(DTI1),]

## fit FGAM using FA measurements along corpus callosum
## as functional predictor with PASAT as response
## using 8 cubic B-splines for marginal bases with third
## order marginal difference penalties
## specifying gamma > 1 enforces more smoothing when using
## GCV to choose smoothing parameters
fit1 <- pfr(pasat ~ af(cca, k=c(8,8), m=list(c(2,3), c(2,3)),
                       presmooth="bspline", bs="ps"),
            method="GCV.Cp", gamma=1.2, data=DTI2)
plot(fit1, scheme=2)
vis.pfr(fit1)

## af term for the cca measurements plus an lf term for the rcst measurements
## leave out 10 samples for prediction
test <- sample(nrow(DTI2), 10)
fit2 <- pfr(pasat ~ af(cca, k=c(7,7), m=list(c(2,2), c(2,2)), bs="ps",
                       presmooth="fpca.face") +
                    lf(rcst, k=7, m=c(2,2), bs="ps"),
            method="GCV.Cp", gamma=1.2, data=DTI2[-test,])
par(mfrow=c(1,2))
plot(fit2, scheme=2, rug=FALSE)
vis.pfr(fit2, select=1, xval=.6)
pred <- predict(fit2, newdata = DTI2[test,], type='response', PredOutOfRange = TRUE)
sqrt(mean((DTI2$pasat[test] - pred)^2))

## Try to predict the binary response disease status (case or control)
##   using the quantile transformed measurements from the rcst tract
##   with a smooth component for a scalar covariate that is pure noise
DTI3 <- DTI[DTI$visit==1,]
DTI3 <- DTI3[complete.cases(DTI3$rcst),]
z1 <- rnorm(nrow(DTI3))
fit3 <- pfr(case ~ af(rcst, k=c(7,7), m = list(c(2, 1), c(2, 1)), bs="ps",
                      presmooth="fpca.face", Qtransform=TRUE) +
                    s(z1, k = 10), family="binomial", select=TRUE, data=DTI3)
par(mfrow=c(1,2))
plot(fit3, scheme=2, rug=FALSE)
abline(h=0, col="green")

# 4 versions: fit with/without Qtransform, plotted with/without Qtransform
fit4 <- pfr(case ~ af(rcst, k=c(7,7), m = list(c(2, 1), c(2, 1)), bs="ps",
                      presmooth="fpca.face", Qtransform=FALSE) +
                    s(z1, k = 10), family="binomial", select=TRUE, data=DTI3)
par(mfrow=c(2,2))
zlms <- c(-7.2,4.3)
plot(fit4, select=1, scheme=2, main="QT=FALSE", zlim=zlms, xlab="t", ylab="rcst")
plot(fit4, select=1, scheme=2, Qtransform=TRUE, main="QT=FALSE", rug=FALSE,
     zlim=zlms, xlab="t", ylab="p(rcst)")
plot(fit3, select=1, scheme=2, main="QT=TRUE", zlim=zlms, xlab="t", ylab="rcst")
plot(fit3, select=1, scheme=2, Qtransform=TRUE, main="QT=TRUE", rug=FALSE,
     zlim=zlms, xlab="t", ylab="p(rcst)")

vis.pfr(fit3, select=1, plot.type="contour")
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab