Does k-fold cross-validation for classo, produces a plot, and returns a
value for lambda
cv.classo(
x,
y,
weights = NULL,
lambda = NULL,
nfolds = 10,
foldid = NULL,
alignment = c("lambda", "fraction"),
keep = FALSE,
parallel = FALSE,
trace.it = 0,
...
)an object of class "cv.classo" is returned, which is a list
with the ingredients of the cross-validation fit.
the values of lambda used in the fits.
The mean cross-validated error - a vector of length length(lambda).
estimate of standard error of cvm.
upper curve = cvm+cvsd.
lower curve = cvm-cvsd.
number of non-zero coefficients at each lambda.
a text string indicating type of measure for plotting purposes).
a fitted classo object for the full data.
value of lambda that gives minimum cvm.
largest value of lambda such that error is within 1 standard error of the minimum.
if keep=TRUE, this is the array of pre-validated fits. Some entries can be NA,
if that and subsequent values of lambda are not reached for that fold
if keep=TRUE, the fold assignments used
a one column matrix with the indices of lambda.min and lambda.1se in the sequence of coefficients, fits etc.
x matrix as in classo.
response y as in classo.
Observation weights; defaults to 1 per observation
Optional user-supplied lambda sequence; default is NULL,
and classo chooses its own sequence. Note that this is done for the full model (master sequence), and separately for each fold.
The fits are then aligned using the master sequence (see the alignment
argument for additional details). Adapting lambda for each fold
leads to better convergence. When lambda is supplied, the same sequence
is used everywhere.
number of folds - default is 10. Although nfolds can be
as large as the sample size (leave-one-out CV), it is not recommended for
large dataset. Smallest value allowable is nfolds=3
an optional vector of values between 1 and nfolds
identifying what fold each observation is in. If supplied, nfolds can
be missing.
This is an experimental argument, designed to fix the
problems users were having with CV, with possible values "lambda"
(the default) else "fraction". With "lambda" the lambda
values from the master fit (on all the data) are used to line up the
predictions from each of the folds. In some cases this can give strange
values, since the effective lambda values in each fold could be quite
different. With "fraction" we line up the predictions in each fold
according to the fraction of progress along the regularization. If in the
call a lambda argument is also provided, alignment="fraction"
is ignored (with a warning).
If keep=TRUE, a prevalidated array is returned
containing fitted values for each observation and each value of lambda.
This means these fits are computed with this observation and the rest of its fold omitted.
The foldid vector is also returned. Default is keep=FALSE.
If TRUE, use parallel foreach to fit each
fold. Must register parallel before hand, such as doMC or others.
Currently it is unavailable.
If trace.it=1, then progress bars are displayed;
useful for big models that take a long time to fit. Limited tracing if
parallel=TRUE
Other arguments that can be passed to classo
Navonil Deb, Younghoon Kim, Sumanta Basu
Maintainer: Younghoon Kim
yk748@cornell.edu
The function runs classo nfolds+1 times; the first to get the
lambda sequence, and then the remainder to compute the fit with each
of the folds omitted. The error is accumulated, and the average error and
standard deviation over the folds is computed.
Note that the results of cv.classo are random, since the folds
are selected at random. Users can reduce this randomness by running
cv.classo many times, and averaging the error curves.
classo and plot and coef methods for "cv.classo".
# \donttest{
set.seed(1010)
n = 1000
p = 200
x = array(rnorm(n*p), c(n,p)) + (1+1i) * array(rnorm(n*p), c(n,p))
for (j in 1:p) x[,j] = x[,j] / sqrt(mean(Mod(x[,j])^2))
e = rnorm(n) + (1+1i) * rnorm(n)
b = c(1, -1, rep(0, p-2)) + (1+1i) * c(-0.5, 2, rep(0, p-2))
y = x %*% b + e
cv.test = cv.classo(x,y)
# }
Run the code above in your browser using DataLab