cvdglars(formula, family = c("binomial", "poisson"), data,
subset, contrast = NULL, control = list())
cvdglars.fit(X, y, family = c("binomial", "poisson"),
control = list())cvdglars returns an object with S3 class "cvdglars", i.e. a list containing the following components:
ng used to store the mean cross-validation deviance;ng used to store the variance of the mean cross-validation deviance;01234cvdglars function runs dglars nfold+1 times. The deviance is stored, and the average and its standard deviation over the folds are computed.cvdglars.fit is the workhorse function: it is more efficient when the design matrix have already been calculated. For this reason we suggest to use this function
when the dgLARS method is applied in a high-dimensional setting, i.e. when p>n .
The control argument is a list that can supply any of the following components:
algorithmalgorithm = "pc" (default)
the predictor-corrector method is used while the cyclic coordinate descent method is used if algorithm = "ccd";
methodmethod = "dgLASSO" (default)
the algorithm computes the solution curve defined by the differential geometric generalization of the LASSO estimator; otherwise, if method = "dgLAR", the
differential geometric generalization of the least angle regression method is computed;
nfoldnfolds can be as large as the sample size (leave-one-out CV), it
is not recommended for large datasets. Default is nfold = 10;
foldidfoldid is
randomly generated;
ngng = 100;
nvpc algorithm. An integer value belonging to the interval $[1;min(n,p)]$ (default is nv = min(n-1,p)) used to
specify the maximum number of variables included in the final model;
nppc/ccd algorithm. A non negative integer used to define the maximum number of points of the solution curve. For the
predictor-corrector algorithm np is set to $50 \cdot min(n-1,p)$ (default), while for the cyclic coordinate descent method is set to 100 (default), i.e. the number
of values of the tuning parameter $\gamma$;
g0pc/ccd algorithm. Set the smallest value for the tuning parameter $\gamma$. Default is g0 = ifelse(p;
dg_maxpc algorithm. A non negative value used to specify the maximum length of the step size. Setting dg_max = 0
(default) the predictor-corrector algorithm uses the optimal step size (see Augugliaro et al. (accepted) for more details) to approximate the value of the tuning parameter corresponding to the
inclusion/exclusion of a variable from the model;
nNRpc algorithm. A non negative integer used to specify the maximum number of iterations of the Newton-Raphson algorithm
used in the corrector step. Default is nNR = 50;
NRepspc algorithm. A non negative value used to define the convergence criterion of the Newton-Raphson algorithm. Default is
NReps = 1.0e-06;
ncrctpc algorithm. When one of the following conditions is satisfied
i.
ii.eps
then the step size ($d\gamma$) is reduced by $d\gamma = cf \cdot d\gamma$ and the corrector step is repeated. ncrct is a non negative integer used to specify
the maximum number of trials of the corrector step. Default is ncrct = 50;
cfpc algorithm. The contractor factor is a real value belonging to the interval $[0,1]$ used to reduce the step size as previously
described. Default is cf = 0.5;
nccdccd algorithm. A non negative integer used to specify the maximum number of steps of the cyclic coordinate descent algorithm.
Default is 1.0e+05.
epspc/ccd algorithm. The meaning of this parameter is related to the algorithm used to estimate the dgLARS solution curve, namely
i.algorithm = "pc", eps is used
a.
b.
c.
ii.algorithm = "ccd", eps is used to define the convergence of a single solution point, i.e. each inner
coordinate-descent loop continues until the maximum change in the Rao's score test statistic, after any coefficient update, is less than eps.
Default is eps = 1.0e-05.
Augugliaro L., Mineo A.M. and Wit E.C. (2013) dgLARS: a differential geometric approach to sparse generalized linear models, Journal of the Royal Statistical Society. Series B., Vol 75(3), 471-498.
Augugliaro L., Mineo A.M. and Wit E.C. (2012) Differential geometric LARS via cyclic coordinate descent method, in Proceeding of COMPSTAT 2012, pp. 67-79. Limassol, Cyprus.
coef.cvdglars, print.cvdglars, plot.cvdglars methods
###########################
# Logistic regression model
set.seed(123)
n <- 100
p <- 10
X <- matrix(rnorm(n*p), n, p)
b <- 1:2
eta <- b[1] + X[,1] * b[2]
mu <- binomial()$linkinv(eta)
y <- rbinom(n, 1, mu)
fit_cv <- cvdglars.fit(X, y, family = "binomial")
fit <- dglars.fit(X, y, family = "binomial", control = list(g0=fit_cv$g_hat))
fit_cv
fit$beta[,fit$np]
Run the code above in your browser using DataLab