cv.rgam: Cross-validation for reluctant generalized additive model (rgam)

Description

Does k-fold cross-validation for rgam.

Usage

cv.rgam(x, y, lambda = NULL, family = c("gaussian", "binomial",
  "poisson", "cox"), offset = NULL, init_nz, gamma, nfolds = 10,
  foldid = NULL, keep = FALSE, parallel = FALSE, verbose = TRUE,
  ...)

Arguments

Input matrix, of dimension nobs x nvars; each row is an observation vector.

Response y as in rgam.

lambda

A user-supplied lambda sequence. Typical usage is to have the program compute its own lambda sequence; supplying a value of lambda overrides this.

family

Response type. Either "gaussian" (default) for linear regression, "binomial" for logistic regression, "poisson" for Poisson regression or "cox" for Cox regression.

offset

Offset vector as in rgam.

init_nz

A vector specifying which features we must include when computing the non-linear features. Default is to construct non-linear features for all given features.

gamma

Scale factor for non-linear features (vs. original features), to be between 0 and 1. Default is 0.8 if init_nz = c(), 0.6 otherwise.

nfolds

Number of folds for CV (default is 10). Although nfolds can be as large as the sample size (leave-one-out CV), it is not recommended for large datasets. Smallest value allowable is nfolds = 4.

foldid

An optional vector of values between 1 and nfolds identifying what fold each observation is in. If supplied, nfolds can be missing.

keep

If keep = TRUE, a prevalidated array is returned containing fitted values for each observation at each value of lambda. This means these fits are computed with this observation and the rest of its fold omitted. Default is FALSE.

parallel

If TRUE, use parallel foreach to fit each fold. Must register parallel before hand, such as doMC or others. Note that this also passes parallel = TRUE to the rgam() call within each fold. Default is FALSE.

verbose

Print information as model is being fit? Default is TRUE.

...

Other arguments that can be passed to rgam.

Value

An object of class "cv.rgam".

glmfit

A fitted rgam object for the full data.

lambda

The values of lambda used in the fits.

nzero_feat

The number of non-zero features for the model glmfit.

nzero_lin

The number of non-zero linear components for the model glmfit.

nzero_nonlin

The number of non-zero non-linear components for the model glmfit.

fit.preval

If keep=TRUE, this is the array of prevalidated fits.

cvm

The mean cross-validated error: a vector of length length(lambda).

cvse

Estimate of standard error of cvm.

cvlo

Lower curve = cvm - cvsd.

cvup

Upper curve = cvm + cvsd.

lambda.min

The value of lambda that gives minimum cvm.

lambda.1se

The largest value of lambda such that the CV error is within one standard error of the minimum.

foldid

If keep=TRUE, the fold assignments used.

name

Name of error measurement used for CV.

call

The call that produced this object.

Details

The function runs rgam nfolds+1 times; the first to get the lambda sequence, and then the remainder to compute the fit with each of the folds omitted. The error is accumulated, and the average error and standard deviation over the folds is computed.

Note that cv.rgam only does cross-validation for lambda but not for the degrees of freedom hyperparameter.

Examples

Run this code

# NOT RUN {
set.seed(1)
n <- 100; p <- 20
x <- matrix(rnorm(n * p), n, p)
beta <- matrix(c(rep(2, 5), rep(0, 15)), ncol = 1)
y <- x %*% beta + rnorm(n)

cvfit <- cv.rgam(x, y)

# specify number of folds
cvfit <- cv.rgam(x, y, nfolds = 5)

# }

Run the code above in your browser using DataLab