# cvrisk

##### Cross-Validation

Cross-validated estimation of the empirical risk for hyper-parameter selection.

- Keywords
- models, regression

##### Usage

```
# S3 method for mboost
cvrisk(object, folds = cv(model.weights(object)),
grid = 0:mstop(object),
papply = mclapply,
fun = NULL, mc.preschedule = FALSE, ...)
cv(weights, type = c("bootstrap", "kfold", "subsampling"),
B = ifelse(type == "kfold", 10, 25), prob = 0.5, strata = NULL)
```## Plot cross-valiation results
# S3 method for cvrisk
plot(x,
xlab = "Number of boosting iterations", ylab = attr(x, "risk"),
ylim = range(x), main = attr(x, "type"), ...)

##### Arguments

- object
an object of class

`mboost`

.- folds
a weight matrix with number of rows equal to the number of observations. The number of columns corresponds to the number of cross-validation runs. Can be computed using function

`cv`

and defaults to 25 bootstrap samples.- grid
a vector of stopping parameters the empirical risk is to be evaluated for.

- papply
(parallel) apply function, defaults to

`mclapply`

. Alternatively,`parLapply`

can be used. In the latter case, usually more setup is needed (see example for some details). To run`cvrisk`

sequentially (i.e. not in parallel), one can use`lapply`

.- fun
if

`fun`

is NULL, the out-of-sample risk is returned.`fun`

, as a function of`object`

, may extract any other characteristic of the cross-validated models. These are returned as is.- mc.preschedule
preschedule tasks if are parallelized using

`mclapply`

(default:`FALSE`

)? For details see`mclapply`

.- weights
a numeric vector of weights for the model to be cross-validated.

- type
character argument for specifying the cross-validation method. Currently (stratified) bootstrap, k-fold cross-validation and subsampling are implemented.

- B
number of folds, per default 25 for

`bootstrap`

and`subsampling`

and 10 for`kfold`

.- prob
percentage of observations to be included in the learning samples for subsampling.

- strata
a factor of the same length as

`weights`

for stratification.- x
an object of class

`cvrisk`

.- xlab, ylab
axis labels.

- ylim
limits of y-axis.

- main
main title of graphic.

- ...

##### Details

The number of boosting iterations is a hyper-parameter of the
boosting algorithms implemented in this package. Honest,
i.e., cross-validated, estimates of the empirical risk
for different stopping parameters `mstop`

are computed by
this function which can be utilized to choose an appropriate
number of boosting iterations to be applied.

Different forms of cross-validation can be applied, for example
10-fold cross-validation or bootstrapping. The weights (zero weights
correspond to test cases) are defined via the `folds`

matrix.

`cvrisk`

runs in parallel on OSes where forking is possible
(i.e., not on Windows) and multiple cores/processors are available.
The scheduling
can be changed by the corresponding arguments of
`mclapply`

(via the dot arguments).

The function `cv`

can be used to build an appropriate
weight matrix to be used with `cvrisk`

. If `strata`

is defined
sampling is performed in each stratum separately thus preserving
the distribution of the `strata`

variable in each fold.

There exist various functions to display and work with
cross-validation results. One can `print`

and `plot`

(see above)
results and extract the optimal iteration via `mstop`

.

##### Value

An object of class `cvrisk`

(when `fun`

wasn't specified), basically a matrix
containing estimates of the empirical risk for a varying number
of bootstrap iterations. `plot`

and `print`

methods
are available as well as a `mstop`

method.

##### References

Torsten Hothorn, Friedrich Leisch, Achim Zeileis and Kurt Hornik (2006),
The design and analysis of benchmark experiments.
*Journal of Computational and Graphical Statistics*, **14**(3),
675--699.

Andreas Mayr, Benjamin Hofner, and Matthias Schmid (2012). The
importance of knowing when to stop - a sequential stopping rule for
component-wise gradient boosting. *Methods of Information in
Medicine*, **51**, 178--186.
DOI: http://dx.doi.org/10.3414/ME11-02-0030

##### See Also

`AIC.mboost`

for
`AIC`

based selection of the stopping iteration. Use `mstop`

to extract the optimal stopping iteration from `cvrisk`

object.

##### Examples

```
# NOT RUN {
data("bodyfat", package = "TH.data")
### fit linear model to data
model <- glmboost(DEXfat ~ ., data = bodyfat, center = TRUE)
### AIC-based selection of number of boosting iterations
maic <- AIC(model)
maic
### inspect coefficient path and AIC-based stopping criterion
par(mai = par("mai") * c(1, 1, 1, 1.8))
plot(model)
abline(v = mstop(maic), col = "lightgray")
### 10-fold cross-validation
cv10f <- cv(model.weights(model), type = "kfold")
cvm <- cvrisk(model, folds = cv10f, papply = lapply)
print(cvm)
mstop(cvm)
plot(cvm)
### 25 bootstrap iterations (manually)
set.seed(290875)
n <- nrow(bodyfat)
bs25 <- rmultinom(25, n, rep(1, n)/n)
cvm <- cvrisk(model, folds = bs25, papply = lapply)
print(cvm)
mstop(cvm)
plot(cvm)
### same by default
set.seed(290875)
cvrisk(model, papply = lapply)
### 25 bootstrap iterations (using cv)
set.seed(290875)
bs25_2 <- cv(model.weights(model), type="bootstrap")
all(bs25 == bs25_2)
# }
# NOT RUN {
############################################################
## Do not run this example automatically as it takes
## some time (~ 5 seconds depending on the system)
### trees
blackbox <- blackboost(DEXfat ~ ., data = bodyfat)
cvtree <- cvrisk(blackbox, papply = lapply)
plot(cvtree)
## End(Not run this automatically)
# }
# NOT RUN {
### cvrisk in parallel modes:
# }
# NOT RUN {
## at least not automatically
## parallel::mclapply() which is used here for parallelization only runs
## on unix systems (here we use 2 cores)
cvrisk(model, mc.cores = 2)
## infrastructure needs to be set up in advance
cl <- makeCluster(25) # e.g. to run cvrisk on 25 nodes via PVM
myApply <- function(X, FUN, ...) {
myFun <- function(...) {
library("mboost") # load mboost on nodes
FUN(...)
}
## further set up steps as required
parLapply(cl = cl, X, myFun, ...)
}
cvrisk(model, papply = myApply)
stopCluster(cl)
# }
# NOT RUN {
# }
```

*Documentation reproduced from package mboost, version 2.9-1, License: GPL-2*