CVd: Cross-validation using delete-d method.

Description

The delete-d method for cross-validation uses a random sample of d observations as the validation sample. This is repeated many times.

Usage

CVd(X, y, d = ceiling(n * (1 - 1/(log(n) - 1))), REP = 100, family = gaussian, ...)

Arguments

training inputs

training output

size of validation sample

REP

number of replications

family

glm family

...

optional arguments passed to glm or lm

Value

Vector of two components comprising the cross-validation MSE and its sd based on the MSE in each validation sample.

Details

Shao (1993, 1997) suggested the delete-d algorithm implemented in this function. In this algorithm, a random sample of d observations are taken as the validation sample. This random sampling is repeated REP times. Shao (1997, p.234, eqn. 4.5 and p.236) suggests $d= n(1-1/(log n - 1))$, This is obtained by taking $\lambda_n = log n$ on page 236 (Shao, 1997). As shown in the table Shao's recommended choice of the d parameter corresponds to validation samples that are typically much larger that used in 10-fold or 5-fold cross-validation. LOOCV corresponds to d=1 only! llll{ n d K=10 K=5 50 33 5 10 100 73 10 20 200 154 20 40 500 405 50 100 1000 831 100 200 }

References

Shao, Jun (1993). Linear Model Selection by Cross-Validation. Journal of the American Statistical Assocation 88, 486-494.

Shao, Jun (1997). An Asymptotic Theory for Linear Model Selection. Statistica Sinica 7, 221-264.

Examples

Run this code

#Example 1. delete-d method
#For the training set, n=67. So 10-fold CV is like using delete-d
#with d=7, approximately.
data(zprostate)
train<-(zprostate[zprostate[,10],])[,-10]
X<-train[,1:2]
y<-train[,9]
set.seed(123321123)
CVd(X, y, d=7, REP=1000)
#about 61.0 and takes about 5 sec on 2.7 GHz PC

Run the code above in your browser using DataLab