# cv.glm

##### Cross-validation for Generalized Linear Models

This function calculates the estimated K-fold cross-validation prediction error for generalized linear models.

- Keywords
- regression

##### Usage

`cv.glm(data, glmfit, cost, K)`

##### Arguments

- data
A matrix or data frame containing the data. The rows should be cases and the columns correspond to variables, one of which is the response.

- glmfit
An object of class

`"glm"`

containing the results of a generalized linear model fitted to`data`

.- cost
A function of two vector arguments specifying the cost function for the cross-validation. The first argument to

`cost`

should correspond to the observed responses and the second argument should correspond to the predicted or fitted responses from the generalized linear model.`cost`

must return a non-negative scalar value. The default is the average squared error function.- K
The number of groups into which the data should be split to estimate the cross-validation prediction error. The value of

`K`

must be such that all groups are of approximately equal size. If the supplied value of`K`

does not satisfy this criterion then it will be set to the closest integer which does and a warning is generated specifying the value of`K`

used. The default is to set`K`

equal to the number of observations in`data`

which gives the usual leave-one-out cross-validation.

##### Details

The data is divided randomly into `K`

groups. For each group the generalized
linear model is fit to `data`

omitting that group, then the function `cost`

is applied to the observed responses in the group that was omitted from the fit
and the prediction made by the fitted models for those observations.

When `K`

is the number of observations leave-one-out cross-validation is used
and all the possible splits of the data are used. When `K`

is less than
the number of observations the `K`

splits to be used are found by randomly
partitioning the data into `K`

groups of approximately equal size. In this
latter case a certain amount of bias is introduced. This can be reduced by
using a simple adjustment (see equation 6.48 in Davison and Hinkley, 1997).
The second value returned in `delta`

is the estimate adjusted by this method.

##### Value

The returned value is a list with the following components.

The original call to `cv.glm`

.

The value of `K`

used for the K-fold cross validation.

A vector of length two. The first component is the raw cross-validation estimate of prediction error. The second component is the adjusted cross-validation estimate. The adjustment is designed to compensate for the bias introduced by not using leave-one-out cross-validation.

The value of `.Random.seed`

when `cv.glm`

was called.

##### Side Effects

The value of `.Random.seed`

is updated.

##### References

Breiman, L., Friedman, J.H., Olshen, R.A. and Stone, C.J. (1984)
*Classification and Regression Trees*. Wadsworth.

Burman, P. (1989) A comparative study of ordinary cross-validation,
*v*-fold cross-validation and repeated learning-testing methods.
*Biometrika*, **76**, 503--514

Davison, A.C. and Hinkley, D.V. (1997)
*Bootstrap Methods and Their Application*. Cambridge University Press.

Efron, B. (1986) How biased is the apparent error rate of a prediction rule?
*Journal of the American Statistical Association*, **81**, 461--470.

Stone, M. (1974) Cross-validation choice and assessment of statistical
predictions (with Discussion).
*Journal of the Royal Statistical Society, B*, **36**, 111--147.

##### See Also

##### Examples

```
# NOT RUN {
# leave-one-out and 6-fold cross-validation prediction error for
# the mammals data set.
data(mammals, package="MASS")
mammals.glm <- glm(log(brain) ~ log(body), data = mammals)
(cv.err <- cv.glm(mammals, mammals.glm)$delta)
(cv.err.6 <- cv.glm(mammals, mammals.glm, K = 6)$delta)
# As this is a linear model we could calculate the leave-one-out
# cross-validation estimate without any extra model-fitting.
muhat <- fitted(mammals.glm)
mammals.diag <- glm.diag(mammals.glm)
(cv.err <- mean((mammals.glm$y - muhat)^2/(1 - mammals.diag$h)^2))
# leave-one-out and 11-fold cross-validation prediction error for
# the nodal data set. Since the response is a binary variable an
# appropriate cost function is
cost <- function(r, pi = 0) mean(abs(r-pi) > 0.5)
nodal.glm <- glm(r ~ stage+xray+acid, binomial, data = nodal)
(cv.err <- cv.glm(nodal, nodal.glm, cost, K = nrow(nodal))$delta)
(cv.11.err <- cv.glm(nodal, nodal.glm, cost, K = 11)$delta)
# }
```

*Documentation reproduced from package boot, version 1.3-25, License: Unlimited*