Do elastic net cross-validation for alpha and lambda simultaneously
cva.glmnet(x, ...)# S3 method for default
cva.glmnet(x, y, alpha = seq(0, 1, len = 11)^3,
nfolds = 10, ..., outerParallel = NULL, checkInnerParallel = TRUE)
# S3 method for formula
cva.glmnet(formula, data, ..., weights = NULL,
offset = NULL, subset = NULL, na.action = getOption("na.action"),
drop.unused.levels = FALSE, xlev = NULL, sparse = FALSE,
use.model.frame = FALSE)
# S3 method for cva.glmnet
predict(object, newx, alpha, which = match(TRUE,
abs(object$alpha - alpha) < 1e-08), ...)
# S3 method for cva.glmnet.formula
predict(object, newdata, alpha,
which = match(TRUE, abs(object$alpha - alpha) < 1e-08),
na.action = na.pass, ...)
# S3 method for cva.glmnet
coef(object, alpha, which = match(TRUE, abs(object$alpha
- alpha) < 1e-08), ...)
# S3 method for cva.glmnet.formula
print(x, ...)
# S3 method for cva.glmnet
plot(x, ...)
minlossplot(x, ...)
# S3 method for cva.glmnet
minlossplot(x, ..., cv.type = c("1se", "min"))
A matrix of predictor variables; or for the plotting methods, an object returned by cva.glmnet
.
Further arguments to be passed to lower-level functions. In the case of cva.glmnet
, these arguments are passed to cv.glmnet
; for predict
and coef
, they are passed to predict.cv.glmnet
; and for plot
and minlossplot
, to plot
.
A response vector or matrix (for a multinomial response).
A vector of alpha values for which to do cross-validation. The default is a sequence of 11 values more closely spaced around alpha = 0. For the predict
and coef
methods, the specific value of alpha for which to return predictions/regression coefficients.
The number of cross-validation folds to use. Defaults to 10.
Method of parallelising the outer loop over alpha. See 'Details' below. If NULL
, the loop is run sequentially.
If the outer loop is run in parallel, check that the inner loop over lambda will not be in contention for cores.
A model formula; interaction terms are allowed and will be expanded per the usual rules for linear models.
A data frame or matrix containing the variables in the formula.
An optional vector of case weights to be used in the fitting process. If missing, defaults to an unweighted fit.
An optional vector of offsets, an a priori known component to be included in the linear predictor.
An optional vector specifying the subset of observations to be used to fit the model.
A function which indicates what should happen when the data contains missing values. For the predict
method, na.action = na.pass
will predict missing values with NA
; na.omit
or na.exclude
will drop them.
Should factors have unused levels dropped? Defaults to FALSE
.
A named list of character vectors giving the full set of levels to be assumed for each factor.
Should the model matrix be in sparse format? This can save memory when dealing with many factor variables, each with many levels (but see the warning below).
Should the base model.frame
function be used when constructing the model matrix? This is the standard method that most R modelling functions use, but has some disadvantages. The default is to avoid model.frame
and construct the model matrix term-by-term; see discussion.
For the predict
and coef
methods, an object returned by cva.glmnet
.
For the predict
method, a matrix of predictor variables.
An alternative way of specifying alpha; the index number of the desired value within the alpha vector. If both which
and alpha
are supplied, the former takes precedence.
For the predict
and coef
methods, a data frame containing the observations for which to calculate predictions.
For minlossplot
, which cross-validated loss value to plot for each value of alpha. This can be either "min"
which is the minimum loss, or "1se"
which is the highest loss within 1 standard error of the minimum. The default is "1se"
.
For cva.glmnet.default
, an object of class cva.glmnet
. This is a list containing the following:
alpha
The vector of alpha values
nfolds
The number of folds
modlist
A list of cv.glmnet
objects, containing the cross-validation results for each value of alpha
The function cva.glmnet.formula
adds a few more components to the above, to facilitate working with formulas.
For the predict
method, a vector or matrix of predicted values.
For the coef
method, a vector of regularised regression coefficients.
The cva.glmnet
function does simultaneous cross-validation for both the alpha and lambda parameters in an elastic net model. It follows the procedure outlined in the documentation for glmnet::cv.glmnet
: it creates a vector foldid
allocating the observations into folds, and then calls cv.glmnet
in a loop over different values of alpha, but the same values of foldid
each time.
Optionally this loop over alpha can be parallelised; currently, cva.glmnet
knows about two methods of doing so:
Via parLapply
in the parallel
package. To use this, set outerParallel
to a valid cluster object created by makeCluster
.
Via rxExec
as supplied by Microsoft R Server's RevoScaleR
package. To use this, set outerParallel
to a valid compute context created by RxComputeContext
, or a character string specifying such a context.
If the outer loop is run in parallel, cva.glmnet
can check if the inner loop (over lambda) is also set to run in parallel, and disable this if it would lead to contention for cores. This is done if it is likely that the parallelisation is local on a multicore machine, ie if outerParallel
is a SOCKcluster
object running on "localhost"
, or if the supplied compute context is local parallel.
The formula method works in a similar manner to lm
, glm
and other modelling functions. The arguments are used to generate a model frame, which is a data frame augmented with information about the roles the columns play in fitting the model. This is then turned into a model matrix and a response vector, which are passed to glmnet::glmnet
along with any arguments in ...
. If sparse
is TRUE, then Matrix::sparse.model.matrix
is used instead of stats::model.matrix
to create the model matrix.
The predict
method computes predictions for a specific alpha value given a cva.glmnet
object. It looks up the supplied alpha (possibly supplied indirectly via the which
argument) in the object's stored alpha
vector, and calls glmnet:::predict.cv.glmnet
on the corresponding cv.glmnet
fit. All the arguments to that function are (or should be) supported.
The coef
method is similar, returning the coefficients for the selected alpha value via glmnet:::coef.cv.glmnet
.
The plot method for cva.glmnet
objects plots the average cross-validated loss by lambda, for each value of alpha. Each line represents one cv.glmnet
fit, corresponding to one value of alpha. Note that the specific lambda values can vary substantially by alpha.
The minlossplot
function gives the best (lowest) cross-validated loss for each value of alpha.
# NOT RUN {
cva <- cva.glmnet(mpg ~ ., data=mtcars)
predict(cva, mtcars, alpha=1)
# }
# NOT RUN {
# Leukemia example dataset from Trevor Hastie's website
download.file("http://web.stanford.edu/~hastie/glmnet/glmnetData/Leukemia.RData",
"Leukemia.RData")
load("Leukemia.Rdata")
leuk <- do.call(data.frame, Leukemia)
leuk.cva <- cva.glmnet(y ~ ., leuk, family="binomial")
leuk.pred <- predict(leuk.cva, leuk, which=6)
# }
Run the code above in your browser using DataLab