A routine for performing K-fold cross-validation for gamsel.
cv.gamsel(x, y, lambda, family, degrees, dfs, bases,
type.measure =c("mse", "mae", "deviance", "class"),
nfolds = 10, foldid, keep = FALSE, parallel = FALSE, ...)x matrix as in gamsel
response y as in gamsel
Optional use-supplied lambda sequence. If NULL, default behaviour is for gamsel routine to automatically select a good lambda sequence.
family as in gamsel
degrees as in gamsel
dfs as in gamsel
bases as in gamsel
Loss function for cross-validated error calculation. Currently there are four options: mse (mean squared error), mae (mean absolute error), deviance (deviance, same as mse for family="gaussian"), class (misclassification error, for use with family="binomial").
Numer of folds (default is 10). Maximum value is nobs. Small values of nfolds are recommended for large data sets.
Optional vector of length nobs with values between 1 and nfolds specifying what fold each observation is in.
If keep=TRUE, a prevalidated array is
returned containing fitted values for each observation and each
value of lambda. This means these fits are computed with
this observation and the rest of its fold omitted. The
folid vector is also returned. Default is keep=FALSE
If TRUE, use parallel foreach to fit each fold. See the example below for usage details.
Other arguments that can be passed to gamsel.
an object of class "cv.gamsel" is returned, which is a
list with the ingredients of the cross-validation fit.
the values of lambda used in the fits.
The mean cross-validated error - a vector of length
length(lambda).
estimate of standard error of cvm.
upper curve = cvm+cvsd.
lower curve = cvm-cvsd.
number of non-zero coefficients at each lambda.
a text string indicating type of measure (for plotting purposes).
a fitted gamsel object for the full data.
value of lambda that gives minimum
cvm.
largest value of lambda such that error is
within 1 standard error of the minimum.
if keep=TRUE, this is the array of
prevalidated fits. Some entries can be NA, if that and
subsequent values of lambda are not reached for that fold
if keep=TRUE, the fold assignments used
the sequence number of the minimum lambda.
the sequence number of the 1se lambda value.
This function has the effect of running gamsel nfolds+1 times. The initial run uses all the data and gets the lambda sequence. The remaining runs fit the data with each of the folds omitted in turn.
The error is
accumulated, and the average error and standard deviation over the
folds is computed.
Note that cv.gamsel does NOT search for
values for alpha. A specific value should be supplied, else
alpha=1 is assumed by default. If users would like to
cross-validate alpha as well, they should call cv.gamsel
with a pre-computed vector foldid, and then use this same fold vector
in separate calls to cv.gamsel with different values of
alpha. Note also that the results of cv.gamsel are
random, since the folds are selected at random. Users can reduce this
randomness by running cv.gamsel many times, and averaging the
error curves.
Chouldechova, A. and Hastie, T. (2015) Generalized Additive Model Selection
gamsel, plot function for cv.gamsel object.
# NOT RUN {
data=gendata(n=500,p=12,k.lin=3,k.nonlin=3,deg=8,sigma=0.5)
attach(data)
bases=pseudo.bases(X,degree=10,df=6)
# Gaussian gam
gamsel.out=gamsel(X,y,bases=bases)
par(mfrow=c(1,2),mar=c(5,4,3,1))
summary(gamsel.out)
gamsel.cv=cv.gamsel(X,y,bases=bases)
par(mfrow=c(1,1))
plot(gamsel.cv)
par(mfrow=c(3,4))
plot(gamsel.out,newx=X,index=20)
# }
Run the code above in your browser using DataLab