fpcr: Functional principal component regression

Description

Implements functional principal component regression (Reiss and Ogden, 2007, 2010) for generalized linear models with scalar responses and functional predictors.

Usage

fpcr(y, xfuncs = NULL, fdobj = NULL, ncomp = NULL, pve = 0.99, nbasis = NULL, 
     basismat = NULL, penmat = NULL, argvals = NULL, covt = NULL, 
     mean.signal.term = FALSE, spline.order = NULL, family = gaussian,
     method = "REML", sp = NULL, pen.order = 2, cv1 = FALSE, nfold = 5,
     store.cv = FALSE, store.gam = TRUE, ...)

Arguments

scalar outcome vector.

xfuncs

for 1D predictors, an $n \times d$ matrix of signals/functional predictors, where $n$ is the length of y and $d$ is the number of sites at which each signal is defined. For 2D predictors, an $n \times d1 \times d2$ array representing $n$

fdobj

functional data object (class "fd") giving the functional predictors. Allowed only for 1D functional predictors.

ncomp

number of principal components. If NULL, this is chosen by pve.

pve

proportion of variance explained: used to choose the number of principal components.

nbasis

number(s) of B-spline basis functions: either a scalar, or a vector of values to be compared. For 2D predictors, tensor product B-splines are used, with the given basis dimension(s) in each direction; alternatively, nbasis can be given in th

basismat

a $d \times K$ matrix of values of $K$ basis functions at the $d$ sites.

penmat

a $K \times K$ matrix defining a penalty on the basis coefficients.

argvals

points at which the functional predictors and the coefficient function are evaluated. By default, if 1D functional predictors are given by the $n \times d$ matrix xfuncs, argvals is set to $d$ equally spaced points from 0 to

covt

covariates: an $n$-row matrix, or a vector of length $n$.

mean.signal.term

logical: should the mean of each subject's signal be included as a covariate?

spline.order

order of B-splines used, if fdobj is not supplied; defaults to 4, i.e., cubic B-splines.

family

generalized linear model family. Current version supports "gaussian" (the default) and "binomial".

method

smoothing parameter selection method, passed to function gam; see the gam documentation for details.

a fixed smoothing parameter; if NULL, an optimal value is chosen (see method).

pen.order

order of derivative penalty applied when estimating the coefficient function; defaults to 2.

cv1

logical: should cross-validation be performed to select the best model if only one set of tuning parameters provided? By default, FALSE. Note that, if there are multiple sets of tuning parameters provided, cv is always performed.

nfold

the number of validation sets ("folds") into which the data are divided; by default, 5.

store.cv

logical: should a CV result table be in the output? By default, FALSE.

store.gam

logical: should the gam object be included in the output? Defaults to TRUE.

...

other arguments passed to function gam.

Value

A list with components
gamObjectif store.gam = TRUE, an object of class gam (see gamObject in the mgcv package documentation).
fhatcoefficient function estimate.
sepointwise Bayesian standard error.
undecor.coefundecorrelated coefficient for covariates.
argvalsthe supplied value of argvals.
cv.tablea table giving the CV criterion for each combination of nbasis and ncomp, if store.cv = TRUE; otherwise, the CV criterion only for the optimized combination of these parameters. Set to NULL if CV is not performed.
nbasis, ncompwhen CV is performed, the values of nbasis and comp that minimize the CV criterion.

Details

One-dimensional functional predictors can be given either in functional data object form, using argument fdobj (see the fda package of Ramsay, Hooker and Graves, 2009, and Method 1 in the example below), or explicitly, using xfuncs (see Method 2 in the example). In the latter case, arguments basismat and penmat can also be used to specify the basis and/or penalty matrices (see Method 3). For two-dimensional predictors, functional data object form is not supported. Instead of radial B-splines as in Reiss and Ogden (2010), this implementation employs tensor product cubic B-splines, sometimes known as bivariate O-splines (Ormerod, 2008). For purposes of interpreting the fitted coefficients, please note that the functional predictors are decorrelated from the scalar predictors before fitting the model (when there are no scalar predictors other than the intercept, this just means the columns of the functional predictor matrix are de-meaned); see section 3.2 of Reiss (2006) for details.

References

Ormerod, J. T. (2008). On semiparametric regression and data mining. Ph.D. dissertation, School of Mathematics and Statistics, University of New South Wales. Ramsay, J. O., Hooker, G., and Graves, S. (2009). Functional Data Analysis with R and MATLAB. New York: Springer. Reiss, P. T. (2006). Regression with signals and images as predictors. Ph.D. dissertation, Department of Biostatistics, Columbia University. Available at http://works.bepress.com/phil_reiss/11/. Reiss, P. T., and Ogden, R. T. (2007). Functional principal component regression and functional partial least squares. Journal of the American Statistical Association, 102, 984--996. Reiss, P. T., and Ogden, R. T. (2010). Functional generalized linear models with images as predictors. Biometrics, 66, 61--69. Wood, S. N. (2006). Generalized Additive Models: An Introduction with R. Boca Raton, FL: Chapman & Hall.

Examples

Run this code

require(fda)
### 1D functional predictor example ###

######### Octane data example #########
data(gasoline)

# Create the requisite functional data objects
bbasis = create.bspline.basis(c(900, 1700), 40)
wavelengths = 2*450:850
nir = matrix(NA, 401, 60)
for (i in 1:60) nir[, i] = gasoline$NIR[i, ]
# Why not just take transpose of gasoline$NIR above?  
# Because for some reason it leads to an error in the following statement
gas.fd = smooth.basisPar(wavelengths, nir, bbasis)$fd

# Method 1: Call fpcr with fdobj argument
gasmod1 = fpcr(gasoline$octane, fdobj = gas.fd, ncomp = 30)
plot(gasmod1, xlab="Wavelength")
# Method 2: Call fpcr with explicit signal matrix
gasmod2 = fpcr(gasoline$octane, xfuncs = gasoline$NIR, ncomp = 30)
# Method 3: Call fpcr with explicit signal, basis, and penalty matrices
gasmod3 = fpcr(gasoline$octane, xfuncs = gasoline$NIR, 
               basismat = eval.basis(wavelengths, bbasis), 
               penmat = getbasispenalty(bbasis), ncomp = 30)

# Check that all 3 calls yield essentially identical estimates
all.equal(gasmod1$fhat, gasmod2$fhat, gasmod3$fhat)
# But note that, in general, you'd have to specify argvals in Method 1  
# to get the same coefficient function values as with Methods 2 & 3.

### 2D functional predictor example ###

n = 200; d = 70

# Create true coefficient function
ftrue = matrix(0,d,d)
ftrue[40:46,34:38] = 1

# Generate random functional predictors, and scalar responses
ii = array(rnorm(n*d^2), dim=c(n,d,d))
iimat = ii; dim(iimat) = c(n,d^2)
yy = iimat %*% as.vector(ftrue) + rnorm(n, sd=.3)

mm = fpcr(yy, ii, ncomp=40)

image(ftrue)
contour(mm$fhat, add=TRUE)

### Cross-validation ###
cv.gas = fpcr(gasoline$octane, xfuncs = gasoline$NIR, 
                 nbasis=seq(20,40,5), ncomp = seq(10,20,5), store.cv = TRUE)
image(seq(20,40,5), seq(10,20,5), cv.gas$cv.table, xlab="Basis size", 
      ylab="Number of PCs", xaxp=c(20,40,4), yaxp=c(10,20,2))

Run the code above in your browser using DataLab