Does k-fold cross-validation for glmTLP, produces a plot, and returns a value for lambda
with pre-specified tau
.
cv.glmTLP(x, y, family=c("gaussian","binomial","poisson","multinomial","cox","mgaussian"),
nfolds = 10, weights, offset=NULL, lambda, tau = 0.3,
nlambda=100, penalty.factor = rep(1, nvars),
lambda.min.ratio=ifelse(nobs
x
matrix as in glmnet
.
response variable. Quantitative for family="gaussian"
,
or family="poisson"
(non-negative counts). For
family="binomial"
should be either a factor with two levels, or
a two-column matrix of counts or proportions (the second column is
treated as the target class; for a factor, the last level in
alphabetical order is the target class). For
family="multinomial"
, can be a nc>=2
level factor, or a
matrix with nc
columns of counts or proportions.
For either "binomial"
or "multinomial"
, if y
is
presented as a vector, it will be coerced into a factor. For
family="cox"
, y
should be a two-column matrix with
columns named 'time' and 'status'. The latter is a binary variable,
with '1' indicating death, and '0' indicating right censored. The
function Surv()
in package survival produces such a
matrix. For family="mgaussian"
, y
is a matrix of quantitative responses.
Response type (see above)
number of folds - default is 10. Although nfolds
can be as large as the sample size (leave-one-out CV), it is not
recommended for large datasets. Smallest value allowable is nfolds=3
Observation weights; defaults to 1 per observation
Offset vector (matrix) as in glmnet
Optional user-supplied lambda sequence; default is
NULL
, and glmTLP
chooses its own sequence
Tuning parameter.
The number of lambda
values - default is 100.
Separate penalty factors can be applied to each
coefficient. This is a number that multiplies lambda
to allow
differential shrinkage. Can be 0 for some variables, which implies
no shrinkage, and that variable is always included in the
model. Default is 1 for all variables (and implicitly infinity for
variables listed in exclude
). Note: the penalty factors are
internally rescaled to sum to nvars, and the lambda sequence will
reflect this change.
Smallest value for lambda
, as a fraction of
lambda.max
, the (data derived) entry value (i.e. the smallest
value for which all coefficients are zero). The default depends on the
sample size nobs
relative to the number of variables
nvars
. If nobs > nvars
, the default is 0.0001
,
close to zero. If nobs < nvars
, the default is 0.01
.
A very small value of
lambda.min.ratio
will lead to a saturated fit in the nobs <
nvars
case. This is undefined for
"binomial"
and "multinomial"
models, and glmnet
will exit gracefully when the percentage deviance explained is almost
1.
Logical flag for x variable standardization, prior to
fitting the model sequence. The coefficients are always returned on
the original scale. Default is standardize=TRUE
.
If variables are in the same units already, you might not wish to
standardize. See details below for y standardization with family="gaussian"
.
Should intercept(s) be fitted (default=TRUE) or set to zero (FALSE)
Limit the maximum number of variables in the
model. Useful for very large nvars
, if a partial path is desired.
Limit the maximum number of variables ever to be nonzero
Vector of lower limits for each coefficient;
default -Inf
. Each
of these must be non-positive. Can be presented as a single value
(which will then be replicated), else a vector of length nvars
Vector of upper limits for each coefficient;
default Inf
. See lower.limits
This is for the family="mgaussian"
family, and allows the user to standardize the response variables
Maximum iteration for TLP.
Tolerance.
an object of class "cv.glmnet"
is returned, which is a
list with the ingredients of the cross-validation fit. Although the implementation is different, we try to mimic returning as "cv.glment"
in a popular package glmnet
such that users can use truncated lasso as using elastic net.
the values of lambda
used in the fits.
The mean cross-validated error - a vector of length
length(lambda)
.
estimate of standard error of cvm
.
upper curve = cvm+cvsd
.
lower curve = cvm-cvsd
.
number of non-zero coefficients at each lambda
.
a text string indicating type of measure (for plotting purposes).
a fitted glmnet object for the full data.
value of lambda
that gives minimum
cvm
.
largest value of lambda
such that error is
within 1 standard error of the minimum.
if keep=TRUE
, this is the array of
prevalidated fits. Some entries can be NA
, if that and
subsequent values of lambda
are not reached for that fold
if keep=TRUE
, the fold assignments used
The function runs glmTLP
nfolds
+1 times; the
first to get the lambda
sequence, and then the remainder to
compute the fit with each of the folds omitted. The error is
accumulated, and the average error and standard deviation over the
folds is computed.
Note that cv.glmnet
does NOT search for
values for tau
. A specific value should be supplied, else
tau= 0.3
is assumed by default.
Xiaotong Shen , Wei Pan and Yunzhang Zhu (2012) Likelihood-Based Selection and Sharp Parameter Estimation, Journal of the American Statistical Association, 107:497, 223-232
# NOT RUN {
data("QuickStartExample")
fit = cv.glmTLP(x,y,tau = 1,nfolds = 2, lambda = c(0.1,05))
#We set nflods and lambda just to speed it up
# and pass the CRAN check. You should either use
# the default setting or search a larger space.
# }
Run the code above in your browser using DataLab