Learn R Programming

bestglm (version 0.13)

CVd: Cross-validation using delete-d method.

Description

The delete-d method for cross-validation uses a random sample of d observations as the validation sample. This is repeated many times.

Usage

CVd(X, y, d = ceiling(n * (1 - 1/(log(n) - 1))), REP = 100, family = gaussian, ...)

Arguments

X
training inputs
y
training output
d
size of validation sample
REP
number of replications
family
glm family
...
optional arguments passed to glm or lm

Value

  • Vector of two components comprising the cross-validation MSE and its sd based on the MSE in each validation sample.

Details

Shao (1993, 1997) suggested the delete-d algorithm implemented in this function. In this algorithm, a random sample of d observations are taken as the validation sample. This random sampling is repeated REP times. Shao (1997, p.234, eqn. 4.5 and p.236) suggests $d= n(1-1/(log n - 1))$, This is obtained by taking $\lambda_n = log n$ on page 236 (Shao, 1997). As shown in the table Shao's recommended choice of the d parameter corresponds to validation samples that are typically much larger that used in 10-fold or 5-fold cross-validation. LOOCV corresponds to d=1 only! llll{ n d K=10 K=5 50 33 5 10 100 73 10 20 200 154 20 40 500 405 50 100 1000 831 100 200 }

References

Shao, Jun (1993). Linear Model Selection by Cross-Validation. Journal of the American Statistical Assocation 88, 486-494.

Shao, Jun (1997). An Asymptotic Theory for Linear Model Selection. Statistica Sinica 7, 221-264.

See Also

bestglm, CVHTF, CVDH, LOOCV

Examples

Run this code
#Example 1. delete-d method
#For the training set, n=67. So 10-fold CV is like using delete-d
#with d=7, approximately.
data(zprostate)
train<-(zprostate[zprostate[,10],])[,-10]
X<-train[,1:2]
y<-train[,9]
set.seed(123321123)
CVd(X, y, d=7, REP=1000)
#about 61.0 and takes about 5 sec on 2.7 GHz PC

Run the code above in your browser using DataLab