Performs k-fold cross-validation for l0, l1, or TLP-penalized regression models
over a grid of values for the regularization parameter lambda
(if penalty="l0"
) or kappa
(if penalty="l0"
).
cv.glmtlp(X, y, ..., seed = NULL, nfolds = 10, obs.fold = NULL, ncores = 1)
an object of class "cv.glmtlp"
is returned, which is a list
with the ingredients of the cross-validation fit.
the function call
The mean cross-validated error - a vector of length
length(kappa)
if penalty = "l0"
and length{lambda}
otherwise.
estimate of standard error of cv.mean
.
a fitted glmtlp object for the full data.
the index of the lambda
or kappa
sequence that
corresponding to the smallest cv mean error.
the values of kappa
used in the fits, available when
penalty = 'l0'
.
the value of kappa
that gives the minimum
cv.mean
, available when penalty = 'l0'
.
the values of lambda
used in the fits.
value of lambda
that gives minimum cv.mean
,
available when penalty is 'l1' or 'tlp'.
null deviance of the model.
the fold id for each observation used in the CV.
input matrix, of dimension nobs
x nvars
, as in
glmtlp
.
response, of length nobs, as in glmtlp
.
Other arguments that can be passed to glmtlp
.
the seed for reproduction purposes
number of folds; default is 10. The smallest value allowable
is nfolds=3
an optional vector of values between 1 and nfolds
identifying what fold each observation is in. If supplied, nfolds
can
be missing.
number of cores utilized; default is 1. If greater than 1,
then doParallel::foreach
will be used to fit each fold; if equal to
1, then for loop will be used to fit each fold. Users don't have to register
parallel clusters outside.
Chunlin Li, Yu Yang, Chong Wu
Maintainer: Yu Yang yang6367@umn.edu
The function calls glmtlp
nfolds
+1 times; the first call to get the
lambda
or kappa
sequence, and then the rest to compute
the fit with each of the folds omitted. The cross-validation error is based
on deviance (check here for more details). The error is accumulated over the
folds, and the average error and standard deviation is computed.
When family = "binomial"
, the fold assignment (if not provided by
the user) is generated in a stratified manner, where the ratio of 0/1 outcomes
are the same for each fold.
Shen, X., Pan, W., & Zhu, Y. (2012).
Likelihood-based selection and sharp parameter estimation.
Journal of the American Statistical Association, 107(497), 223-232.
Shen, X., Pan, W., Zhu, Y., & Zhou, H. (2013).
On constrained and regularized high-dimensional regression.
Annals of the Institute of Statistical Mathematics, 65(5), 807-832.
Li, C., Shen, X., & Pan, W. (2021).
Inference for a Large Directed Graphical Model with Interventions.
arXiv preprint arXiv:2110.03805.
Yang, Y., & Zou, H. (2014).
A coordinate majorization descent algorithm for l1 penalized learning.
Journal of Statistical Computation and Simulation, 84(1), 84-95.
Two R package Github: ncvreg and glmnet.
glmtlp
and plot
, predict
, and coef
methods for "cv.glmtlp"
objects.
# Gaussian
X <- matrix(rnorm(100 * 20), 100, 20)
y <- rnorm(100)
cv.fit <- cv.glmtlp(X, y, family = "gaussian", penalty = "l1", seed=2021)
# Binomial
X <- matrix(rnorm(100 * 20), 100, 20)
y <- sample(c(0,1), 100, replace = TRUE)
cv.fit <- cv.glmtlp(X, y, family = "binomial", penalty = "l1", seed=2021)
Run the code above in your browser using DataLab