lf.vd: Construct a VDFR regression term

Description

This function defines the a variable-domain functional regression term for inclusion in an gam-formula (or bam or gamm or gamm4::gamm as constructed by pfr. These are functional predictors for which each function is observed over a domain of different width. The default is the term \(1/T_i\int_0^{T_i}X_i(t)\beta(t,T_i)dt\), where \(X_i(t)\) is a functional predictor of length \(T_i\) and \(\beta(t,T_i)\) is an unknown bivariate coefficient function. Various domain transformations are available, such as lagging or domain-standardizing the coordinates, or parameterizing the interactions; these often result in improved model fit. Basis choice is fully customizable using the options of s and te.

Usage

lf.vd(
  X,
  argvals = seq(0, 1, l = ncol(X)),
  vd = NULL,
  integration = c("simpson", "trapezoidal", "riemann"),
  L = NULL,
  basistype = c("s", "te", "t2"),
  transform = NULL,
  mp = TRUE,
  ...
)

Arguments

matrix containing variable-domain functions. Should be \(N x J\), where \(N\) is the number of subjects and \(J\) is the maximum number of time points per subject. Most rows will have NA values in the right-most columns, corresponding to unobserved time points.

argvals

indices of evaluation of X, i.e. \((t_{i1},.,t_{iJ})\) for subject \(i\). May be entered as either a length-J vector, or as an N by J matrix. Indices may be unequally spaced. Entering as a matrix allows for different observations times for each subject.

vector of values of containing the variable-domain width (\(T_i\) above). Defaults to the argvals value corresponding to the last non-NA element of \(X_i(t)\).

integration

method used for numerical integration. Defaults to "simpson"'s rule for calculating entries in L. Alternatively and for non-equidistant grids, "trapezoidal" or "riemann".

an optional N by ncol(argvals) matrix giving the weights for the numerical integration over t. If present, overrides integration.

basistype

character string indicating type of bivariate basis used. Options include "s" (the default), "te", and "t2", which correspond to mgcv::s, mgcv::te, and mgcv::t2.

transform

character string indicating an optional basis transformation; see Details for options.

for transform=="linear" or transform=="quadratic", TRUE to use multiple penalties for the smooth (one for each marginal basis). If FALSE, penalties are concatonated into a single block-diagonal penalty matrix (with one smoothing parameter).

...

optional arguments for basis and penalization to be passed to the function indicated by basistype. These could include, for example, "bs", "k", "m", etc. See te or s for details.

Value

a list with the following entries

call

a call to s or te, using the appropriately constructed weight matrices

data

data used by the call, which includes the matrices indicated by argname, Tindname, and LXname

the matrix of weights used for the integration

argname

the name used for the argvals variable in the formula used by mgcv::gam

Tindname

the name used for the Tind variable in the formula used by mgcv::gam

LXname

the name of the by variable used by s or te in the formula for mgcv::gam

Details

The variable-domain functional regression model uses the term \(\frac1{T_i}\int_0^{T_i}X_i(t)\beta(t,T_i)dt\) to incorporate a functional predictor with subject-specific domain width. This term imposes a smooth (nonparametric) interaction between \(t\) and \(T_i\). The domain of the coefficient function is the triangular (or trapezoidal) surface defined by \({t,T_i: 0\le t\le T_i}\). The default basis uses bivariate thin-plate regression splines.

Different basis transformations can result in different properties; see Gellar, et al. (2014) for a more complete description. We make five basis transformations easily accessible using the transform argument. This argument is a character string that can take one of the following values:

"lagged": transforms argvals to argvals - vd
"standardized": transforms argvals to argvals/vd, and then rescales vd linearly so it ranges from 0 to 1
"linear": first transforms the domain as in "standardized", then parameterizes the interaction with "vd" to be linear
"quadratic": first transforms the domain as in "standardized", then parameterizes the interaction with "vd" to be quadratic
"noInteraction": first transforms the domain as in "standardized", then reduces the bivariate basis to univariate with no effect of vd. This would be equivalent to using lf on the domain-standardized predictor functions.

The practical effect of using the "lagged" basis is to increase smoothness along the right (diagonal) edge of the resultant estimate. The practical effect of using a "standardized" basis is to allow for greater smoothness at high values of \(T_i\) compared to lower values.

These basis transformations rely on the basis constructors available in the mgcvTrans package. For more specific control over the transformations, you can use bs="dt" and/or bs="pi"; see smooth.construct.dt.smooth.spec or smooth.construct.pi.smooth.spec for an explanation of the options (entered through the xt argument of lf.vd/s).

Note that tensor product bases are only recommended when a standardized transformation is used. Without this transformation, just under half of the "knots" used to define the basis will fall outside the range of the data and have no data available to estimate them. The penalty allows the corresponding coefficients to be estimated, but results may be unstable.

References

Gellar, Jonathan E., Elizabeth Colantuoni, Dale M. Needham, and Ciprian M. Crainiceanu. Variable-Domain Functional Regression for Modeling ICU Data. Journal of the American Statistical Association, 109(508):1425-1439, 2014.

Examples

Run this code

# NOT RUN {
  data(sofa)
  fit.vd1 <- pfr(death ~ lf.vd(SOFA) + age + los,
                 family="binomial", data=sofa)
  fit.vd2 <- pfr(death ~ lf.vd(SOFA, transform="lagged") + age + los,
                 family="binomial", data=sofa)
  fit.vd3 <- pfr(death ~ lf.vd(SOFA, transform="standardized") + age + los,
                 family="binomial", data=sofa)
  fit.vd4 <- pfr(death ~ lf.vd(SOFA, transform="standardized",
                               basistype="te") + age + los,
                 family="binomial", data=sofa)
  fit.vd5 <- pfr(death ~ lf.vd(SOFA, transform="linear", bs="ps") + age + los,
                 family="binomial", data=sofa)
  fit.vd6 <- pfr(death ~ lf.vd(SOFA, transform="quadratic", bs="ps") + age + los,
                 family="binomial", data=sofa)
  fit.vd7 <- pfr(death ~ lf.vd(SOFA, transform="noInteraction", bs="ps") + age + los,
                 family="binomial", data=sofa)
  
  ests <- lapply(1:7, function(i) {
    c.i <- coef(get(paste0("fit.vd", i)), n=173, n2=173) 
    c.i[(c.i$SOFA.arg <= c.i$SOFA.vd),]
  })
  
  # Try plotting for each i
  i <- 1
  lims <- c(-2,8)
  if (requireNamespace("ggplot2", quietly = TRUE) &
      requireNamespace("RColorBrewer", quietly = TRUE)) {
        est <- ests[[i]]
        est$value[est$value<lims[1]] <- lims[1]
        est$value[est$value>lims[2]] <- lims[2]
        ggplot2::ggplot(est, ggplot2::aes(SOFA.arg, SOFA.vd)) +
          ggplot2::geom_tile(ggplot2::aes(colour=value, fill=value)) +
          ggplot2::scale_fill_gradientn(  name="", limits=lims,
                    colours=rev(RColorBrewer::brewer.pal(11,"Spectral"))) +
          ggplot2::scale_colour_gradientn(name="", limits=lims,
                    colours=rev(RColorBrewer::brewer.pal(11,"Spectral"))) +
          ggplot2::scale_y_continuous(expand = c(0,0)) +
          ggplot2::scale_x_continuous(expand = c(0,0)) +
          ggplot2::theme_bw()
  }
# }
# NOT RUN {
  
# }

Run the code above in your browser using DataLab