cv.gamsel: Cross-validation Routine for Gamsel

Description

A routine for performing K-fold cross-validation for gamsel.

Usage

cv.gamsel(x, y, lambda, family, degrees, dfs, bases,
           type.measure =c("mse", "mae", "deviance", "class"),
           nfolds = 10, foldid, keep = FALSE, parallel = FALSE, ...)

Arguments

x matrix as in gamsel

response y as in gamsel

lambda

Optional use-supplied lambda sequence. If NULL, default behaviour is for gamsel routine to automatically select a good lambda sequence.

family

family as in gamsel

degrees

degrees as in gamsel

dfs

dfs as in gamsel

bases

bases as in gamsel

type.measure

Loss function for cross-validated error calculation. Currently there are four options: mse (mean squared error), mae (mean absolute error), deviance (deviance, same as mse for family="gaussian"), class (misclassification error, for use with family="binomial").

nfolds

Numer of folds (default is 10). Maximum value is nobs. Small values of nfolds are recommended for large data sets.

foldid

Optional vector of length nobs with values between 1 and nfolds specifying what fold each observation is in.

keep

If keep=TRUE, a prevalidated array is returned containing fitted values for each observation and each value of lambda. This means these fits are computed with this observation and the rest of its fold omitted. The folid vector is also returned. Default is keep=FALSE

parallel

If TRUE, use parallel foreach to fit each fold. See the example below for usage details.

…

Other arguments that can be passed to gamsel.

Value

an object of class "cv.gamsel" is returned, which is a list with the ingredients of the cross-validation fit.

lambda

the values of lambda used in the fits.

cvm

The mean cross-validated error - a vector of length length(lambda).

cvsd

estimate of standard error of cvm.

cvup

upper curve = cvm+cvsd.

cvlo

lower curve = cvm-cvsd.

nzero

number of non-zero coefficients at each lambda.

name

a text string indicating type of measure (for plotting purposes).

gamsel.fit

a fitted gamsel object for the full data.

lambda.min

value of lambda that gives minimum cvm.

lambda.1se

largest value of lambda such that error is within 1 standard error of the minimum.

fit.preval

if keep=TRUE, this is the array of prevalidated fits. Some entries can be NA, if that and subsequent values of lambda are not reached for that fold

foldid

if keep=TRUE, the fold assignments used

index.min

the sequence number of the minimum lambda.

index.1se

the sequence number of the 1se lambda value.

Details

This function has the effect of running gamsel nfolds+1 times. The initial run uses all the data and gets the lambda sequence. The remaining runs fit the data with each of the folds omitted in turn. The error is accumulated, and the average error and standard deviation over the folds is computed. Note that cv.gamsel does NOT search for values for alpha. A specific value should be supplied, else alpha=1 is assumed by default. If users would like to cross-validate alpha as well, they should call cv.gamsel with a pre-computed vector foldid, and then use this same fold vector in separate calls to cv.gamsel with different values of alpha. Note also that the results of cv.gamsel are random, since the folds are selected at random. Users can reduce this randomness by running cv.gamsel many times, and averaging the error curves.

References

Chouldechova, A. and Hastie, T. (2015) Generalized Additive Model Selection

Examples

Run this code

# NOT RUN {
data=gendata(n=500,p=12,k.lin=3,k.nonlin=3,deg=8,sigma=0.5)
attach(data)
bases=pseudo.bases(X,degree=10,df=6)
# Gaussian gam
gamsel.out=gamsel(X,y,bases=bases)
par(mfrow=c(1,2),mar=c(5,4,3,1))
summary(gamsel.out)
gamsel.cv=cv.gamsel(X,y,bases=bases)
par(mfrow=c(1,1))
plot(gamsel.cv)
par(mfrow=c(3,4))
plot(gamsel.out,newx=X,index=20)
# }

Run the code above in your browser using DataLab