cv.classo: Cross-validation for classo

Description

Does k-fold cross-validation for classo, produces a plot, and returns a value for lambda

Usage

cv.classo(
  x,
  y,
  weights = NULL,
  lambda = NULL,
  nfolds = 10,
  foldid = NULL,
  alignment = c("lambda", "fraction"),
  keep = FALSE,
  parallel = FALSE,
  trace.it = 0,
  ...
)

Value

an object of class "cv.classo" is returned, which is a list with the ingredients of the cross-validation fit.

lambda: the values of lambda used in the fits.
cvm: The mean cross-validated error - a vector of length length(lambda).
cvsd: estimate of standard error of cvm.
cvup: upper curve = cvm+cvsd.
cvlo: lower curve = cvm-cvsd.
nzero: number of non-zero coefficients at each lambda.
name: a text string indicating type of measure for plotting purposes).
classo.fit: a fitted classo object for the full data.
lambda.min: value of lambda that gives minimum cvm.
lambda.1se: largest value of lambda such that error is within 1 standard error of the minimum.
fit.preval: if keep=TRUE, this is the array of pre-validated fits. Some entries can be NA, if that and subsequent values of lambda are not reached for that fold
foldid: if keep=TRUE, the fold assignments used
index: a one column matrix with the indices of lambda.min and lambda.1se in the sequence of coefficients, fits etc.

Arguments

x: x matrix as in classo.
y: response y as in classo.
weights: Observation weights; defaults to 1 per observation
lambda: Optional user-supplied lambda sequence; default is NULL, and classo chooses its own sequence. Note that this is done for the full model (master sequence), and separately for each fold. The fits are then aligned using the master sequence (see the alignment argument for additional details). Adapting lambda for each fold leads to better convergence. When lambda is supplied, the same sequence is used everywhere.
nfolds: number of folds - default is 10. Although nfolds can be as large as the sample size (leave-one-out CV), it is not recommended for large dataset. Smallest value allowable is nfolds=3
foldid: an optional vector of values between 1 and nfolds identifying what fold each observation is in. If supplied, nfolds can be missing.
alignment: This is an experimental argument, designed to fix the problems users were having with CV, with possible values "lambda" (the default) else "fraction". With "lambda" the lambda values from the master fit (on all the data) are used to line up the predictions from each of the folds. In some cases this can give strange values, since the effective lambda values in each fold could be quite different. With "fraction" we line up the predictions in each fold according to the fraction of progress along the regularization. If in the call a lambda argument is also provided, alignment="fraction" is ignored (with a warning).
keep: If keep=TRUE, a prevalidated array is returned containing fitted values for each observation and each value of lambda. This means these fits are computed with this observation and the rest of its fold omitted. The foldid vector is also returned. Default is keep=FALSE.
parallel: If TRUE, use parallel foreach to fit each fold. Must register parallel before hand, such as doMC or others. Currently it is unavailable.
trace.it: If trace.it=1, then progress bars are displayed; useful for big models that take a long time to fit. Limited tracing if parallel=TRUE
...: Other arguments that can be passed to classo

Author

Navonil Deb, Younghoon Kim, Sumanta Basu
Maintainer: Younghoon Kim yk748@cornell.edu

Details

The function runs classo nfolds+1 times; the first to get the lambda sequence, and then the remainder to compute the fit with each of the folds omitted. The error is accumulated, and the average error and standard deviation over the folds is computed.

Note that the results of cv.classo are random, since the folds are selected at random. Users can reduce this randomness by running cv.classo many times, and averaging the error curves.

Examples

Run this code

# \donttest{
set.seed(1010)
n = 1000
p = 200
x = array(rnorm(n*p), c(n,p)) + (1+1i) * array(rnorm(n*p), c(n,p))
for (j in 1:p) x[,j] = x[,j] / sqrt(mean(Mod(x[,j])^2))
e = rnorm(n) + (1+1i) * rnorm(n)
b = c(1, -1, rep(0, p-2)) + (1+1i) * c(-0.5, 2, rep(0, p-2))
y = x %*% b + e
cv.test = cv.classo(x,y)
# }

Run the code above in your browser using DataLab