Does k
-fold cross-validation for rgam
.
cv.rgam(x, y, lambda = NULL, family = c("gaussian", "binomial",
"poisson", "cox"), offset = NULL, init_nz, gamma, nfolds = 10,
foldid = NULL, keep = FALSE, parallel = FALSE, verbose = TRUE,
...)
Input matrix, of dimension nobs x nvars
; each row is
an observation vector.
Response y
as in rgam
.
A user-supplied lambda
sequence. Typical usage is to
have the program compute its own lambda
sequence; supplying a value of
lambda overrides this.
Response type. Either "gaussian"
(default) for linear
regression, "binomial"
for logistic regression, "poisson"
for
Poisson regression or "cox"
for Cox regression.
Offset vector as in rgam
.
A vector specifying which features we must include when computing the non-linear features. Default is to construct non-linear features for all given features.
Scale factor for non-linear features (vs. original features),
to be between 0 and 1. Default is 0.8 if init_nz = c()
, 0.6 otherwise.
Number of folds for CV (default is 10). Although nfolds
can be as large as the sample size (leave-one-out CV), it is not recommended
for large datasets. Smallest value allowable is nfolds = 4
.
An optional vector of values between 1 and nfolds
identifying what fold each observation is in. If supplied, nfolds
can
be missing.
If keep = TRUE
, a prevalidated array is returned
containing fitted values for each observation at each value of lambda. This
means these fits are computed with this observation and the rest of its fold
omitted. Default is FALSE
.
If TRUE, use parallel foreach to fit each fold. Must
register parallel before hand, such as doMC or others. Note that this also
passes parallel = TRUE
to the rgam()
call within each fold.
Default is FALSE.
Print information as model is being fit? Default is
TRUE
.
Other arguments that can be passed to rgam
.
An object of class "cv.rgam"
.
A fitted rgam
object for the full data.
The values of lambda
used in the fits.
The number of non-zero features for the model glmfit
.
The number of non-zero linear components for the model
glmfit
.
The number of non-zero non-linear components for the
model glmfit
.
If keep=TRUE
, this is the array of prevalidated
fits.
The mean cross-validated error: a vector of length
length(lambda)
.
Estimate of standard error of cvm
.
Lower curve = cvm - cvsd
.
Upper curve = cvm + cvsd
.
The value of lambda
that gives minimum
cvm
.
The largest value of lambda
such that the CV
error is within one standard error of the minimum.
If keep=TRUE
, the fold assignments used.
Name of error measurement used for CV.
The call that produced this object.
The function runs rgam
nfolds+1 times; the first to get the lambda
sequence, and then the remainder to compute the fit with each of the folds
omitted. The error is accumulated, and the average error and standard
deviation over the folds is computed.
Note that cv.rgam
only does cross-validation for lambda but not for
the degrees of freedom hyperparameter.
# NOT RUN {
set.seed(1)
n <- 100; p <- 20
x <- matrix(rnorm(n * p), n, p)
beta <- matrix(c(rep(2, 5), rep(0, 15)), ncol = 1)
y <- x %*% beta + rnorm(n)
cvfit <- cv.rgam(x, y)
# specify number of folds
cvfit <- cv.rgam(x, y, nfolds = 5)
# }
Run the code above in your browser using DataLab