wcr: Principal component regression and partial least squares in the wavelet domain

Description

Performs generalized linear scalar-on-function or scalar-on-image regression in the wavelet domain, by sparse principal component regression (PCR) and sparse partial least squares (PLS).

Usage

wcr(y, xfuncs, min.scale, nfeatures, ncomp, method = c("pcr", "pls"), 
    mean.signal.term = FALSE, covt = NULL, filter.number = 10, 
    wavelet.family = "DaubLeAsymm", family = "gaussian", 
    cv1 = FALSE, nfold = 5, compare.fits = FALSE, store.cv = FALSE,
    store.glm = TRUE)

Arguments

scalar outcome vector.

xfuncs

functional predictors. For 1D predictors, an $n \times d$ matrix of signals, where $n$ is the length of y and $d$ is the number of sites at which each signal is defined. For 2D predictors, an $n \times d \times d$ array comprising $n$ images

min.scale

either a scalar, or a vector of values to be compared. Used to control the coarseness level of wavelet decomposition. The maximum value of min.scale is $log_2(d) - 1$.

nfeatures

number(s) of features, i.e. wavelet coefficients, to retain for prediction of y: either a scalar, or a vector of values to be compared.

ncomp

number(s) of principal components (if method="pcr") or PLS components (if method="pcr"): either a scalar, or a vector of values to be compared.

method

either "pcr" (principal component regression) (the default) or "pls" (partial least squares).

mean.signal.term

logical: should the mean of each subject's signal be included as a covariate? By default, FALSE.

covt

covariates: an $n$-row matrix, or a vector of length $n$; defaults to NULL.

filter.number

argument passed to function wd or imwd in the wavethresh package. Used to select the smoothness of wavelet in the decomposition; defau

wavelet.family

family of wavelets: passed to functions wd or imwd; defaults to "DaubLeAsymm".

family

generalized linear model family. Current version supports "gaussian" (the default) and "binomial".

cv1

logical: should cross-validation be performed (to estimate prediction error) even if a single value is provided for each of min.scale, nfeatures and ncomp? By default, FALSE. Note that if multiple candid

nfold

the number of validation sets ("folds") into which the data are divided.

compare.fits

logical: if TRUE, the pairwise differences among training-set coefficient function estimates are computed.

store.cv

logical: should the output include a CV result table?

store.glm

logical: should the output include the fitted glm? Defaults to TRUE, but may need to be set to FALSE for simulations.

Value

glmif store.glm = TRUE, the fitted glm object.
fhatcoefficient function estimate.
undecor.coefcoefficients for covariates without decorrelation. The model is fitted after decorrelating the functional predictors from any scalar covariates; but for CV, one needs the "undecorrelated" coefficients from the training-set models.
min.scale, nfeatures, ncompthe values of these arguments specified in the call, if a single value was supplied; otherwise, the values chosen by CV.
cv.tablea table giving the CV criterion for each combination of min.scale, nfeatures and ncomp, if store.cv = TRUE; otherwise, the CV criterion only for the optimized combination of these parameters. Set to NULL if CV is not performed.
stabilityif compare.fits = TRUE, a table of stability measures for each combination of min.scale, nfeatures and ncomp. The stability measure is derived by computing the sample variance of the training-set coefficient function estimates at each point, then averaging over all points.

Details

Essentially, the algorithm works by (1) applying the discrete wavelet transform (DWT) to the predictors; (2) retaining only the nfeatures wavelet coefficients having the highest variance (for PCR; cf. Johnstone and Lu, 2009) or highest covariance with y (for PLS); (3) regressing y on the leading ncomp PCs or PLS components; and (4) applying the inverse DWT to the result to obtain the coefficient function estimate fhat.

This function supports only the standard DWT (see argument type in wd) with periodic boundary handling (see argument bc in wd).

See the Details for fpcr for a note regarding decorrelation.

References

Johnstone, I. M., and Lu, Y. (2009). On consistency and sparsity for principal components analysis in high dimensions. Journal of the American Statistical Association, 104, 682--693.

Examples

Run this code

### 1D functional predictor example ###
data(gasoline)

# input a single value of each tuning parameter
gas.wpcr1 <- wcr(gasoline$octane, xfuncs = gasoline$NIR[,1:256], min.scale = 0,
                 nfeatures = 20, ncomp = 15)
gas.wpls1 <- wcr(gasoline$octane, xfuncs = gasoline$NIR[,1:256], min.scale = 0, 
                 nfeatures = 20, ncomp = 15, method = "pls")

# input vectors of tuning parameter values
gas.wpcr2 <- wcr(gasoline$octane, xfuncs = gasoline$NIR[,1:256], min.scale = 0:3,
                 nfeatures = c(16, 18, 20), ncomp = 10:15)
gas.wpls2 <- wcr(gasoline$octane, xfuncs = gasoline$NIR[,1:256], min.scale = 0:3,
                 nfeatures = c(16, 18, 20), ncomp = 10:15, method = "pls")

### 2D functional predictor example ###

n = 200; d = 64

# Create true coefficient function
ftrue = matrix(0,d,d)
ftrue[40:46,34:38] = 1

# Generate random functional predictors, and scalar responses
ii = array(rnorm(n*d^2), dim=c(n,d,d))
iimat = ii; dim(iimat) = c(n,d^2)
yy = iimat %*% as.vector(ftrue) + rnorm(n, sd=.3)

mm.wpls <- wcr(yy, xfuncs = ii, min.scale = 4, nfeatures = 20, ncomp = 6,
                method = "pls")
image(ftrue)
contour(mm.wpls$fhat, add=TRUE)

Run the code above in your browser using DataLab