GAMBoost (version 1.2-3)

cv.GAMBoost: Cross-validation for GAMBoost fits

Description

Performs a K-fold cross-validation for GAMBoost in search for the optimal number of boosting steps.

Usage

cv.GAMBoost(x=NULL,y,x.linear=NULL,subset=NULL,maxstepno=500, family=binomial(),weights=rep(1,length(y)), calc.hat=TRUE,calc.se=TRUE,trace=FALSE, parallel=FALSE,upload.x=TRUE,multicore=FALSE,folds=NULL, K=10,type=c("loglik","error","L2"),pred.cutoff=0.5, just.criterion=FALSE,...)

Arguments

x
n * p matrix of covariates with potentially non-linear influence. If this is not given (and argument x.linear is employed), a generalized linear model is fitted.
y
response vector of length n.
x.linear
optional n * q matrix of covariates with linear influence.
subset
an optional vector specifying a subset of observations to be used in the fitting process.
maxstepno
maximum number of boosting steps to evaluate.
family,weights,calc.hat,calc.se
arguments passed to GAMBoost.
trace
logical value indicating whether information on progress should be printed.
parallel
logical value indicating whether computations in the cross-validation folds should be performed in parallel on a compute cluster, using package snowfall. Parallelization is performed via the package snowfall and the initialization function of of this package, sfInit, should be called before calling cv.GAMBoost.
upload.x
logical value indicating whether x and x.linear should/have to be uploaded to the compute cluster for parallel computation. Uploading these only once (using sfExport(x,x.linear) from library snowfall) can save much time for large data sets.
multicore
indicates whether computations in the cross-validation folds should be performed in parallel, using package multicore. If TRUE, package multicore is employed using the default number of cores. A value larger than 1 is taken to be the number of cores that should be employed.
folds
if not NULL, this has to be a list of length K, each element being a vector of indices of fold elements. Useful for employing the same folds for repeated runs.
K
number of folds to be used for cross-validation.
type, pred.cutoff
goodness-of-fit criterion: likelihood ("loglik"), error rate for binary response data ("error") or squared error for others ("L2"). For binary response data and the "error" criterion pred.cutoff specifies the p value cutoff for prediction of class 1 vs 0.
just.criterion
logical value indicating wether a list with the goodness-of-fit information should be returned or a GAMBoost fit with the optimal number of steps.
...
miscellaneous parameters for the calls to GAMBoost

Value

GAMBoost fit with the optimal number of boosting steps or list with the following components:
criterion
vector with goodness-of fit criterion for boosting step 1 , ... , maxstep
se
vector with standard error estimates for the goodness-of-fit criterion in each boosting step.
selected
index of the optimal boosting step.
folds
list of length K, where the elements are vectors of the indices of observations in the respective folds.

See Also

GAMBoost

Examples

Run this code
## Not run: 
# ##  Generate some data 
# 
# x <- matrix(runif(100*8,min=-1,max=1),100,8)             
# eta <- -0.5 + 2*x[,1] + 2*x[,3]^2
# y <- rbinom(100,1,binomial()$linkinv(eta))
# 
# ##  Fit the model with smooth components
# 
# gb1 <- GAMBoost(x,y,penalty=400,stepno=100,trace=TRUE,family=binomial()) 
# 
# ##  10-fold cross-validation with prediction error as a criterion
# 
# gb1.crit <- cv.GAMBoost(x,y,penalty=400,maxstepno=100,trace=TRUE,
#                         family=binomial(),
#                         K=10,type="error",just.criterion=TRUE)
# 
# ##  Compare AIC and estimated prediction error
# 
# which.min(gb1$AIC)          
# which.min(gb1.crit$criterion)
# ## End(Not run)

Run the code above in your browser using DataLab