Fit a generalized linear model via penalized maximum likelihood. The regularization path is computed for the truncated lasso at a grid of values for the regularization parameter lambda. Can deal with all shapes of data, including very large sparse data matrices. Fit linear, logistic and multinomial, poisson, and Cox regression models.
glmTLP(x, y, family=c("gaussian","binomial","poisson","multinomial","cox","mgaussian"),
weights, offset=NULL, lambda, tau = 0.3, nlambda=100,
penalty.factor = rep(1, nvars), lambda.min.ratio=ifelse(nobs
input matrix, of dimension nobs x nvars; each row is an
observation vector. Can be in sparse matrix format (inherit from class "sparseMatrix"
as in package Matrix
; not yet available for family="cox"
)
response variable. Quantitative for family="gaussian"
,
or family="poisson"
(non-negative counts). For
family="binomial"
should be either a factor with two levels, or
a two-column matrix of counts or proportions (the second column is
treated as the target class; for a factor, the last level in
alphabetical order is the target class). For
family="multinomial"
, can be a nc>=2
level factor, or a
matrix with nc
columns of counts or proportions.
For either "binomial"
or "multinomial"
, if y
is
presented as a vector, it will be coerced into a factor. For
family="cox"
, y
should be a two-column matrix with
columns named 'time' and 'status'. The latter is a binary variable,
with '1' indicating death, and '0' indicating right censored. The
function Surv()
in package survival produces such a
matrix. For family="mgaussian"
, y
is a matrix of quantitative responses.
Response type (see above)
observation weights. Can be total counts if responses are proportion matrices. Default is 1 for each observation
A vector of length nobs
that is included in the linear predictor (a nobs x nc
matrix for the "multinomial"
family). Useful for the "poisson"
family (e.g. log of exposure time), or for refining a model by starting at a current fit. Default is NULL
. If supplied, then values must also be supplied to the predict
function.
Write something about tau
The number of lambda
values - default is 100.
Separate penalty factors can be applied to each
coefficient. This is a number that multiplies lambda
to allow
differential shrinkage. Can be 0 for some variables, which implies
no shrinkage, and that variable is always included in the
model. Default is 1 for all variables (and implicitly infinity for
variables listed in exclude
). Note: the penalty factors are
internally rescaled to sum to nvars, and the lambda sequence will
reflect this change.
Smallest value for lambda
, as a fraction of
lambda.max
, the (data derived) entry value (i.e. the smallest
value for which all coefficients are zero). The default depends on the
sample size nobs
relative to the number of variables
nvars
. If nobs > nvars
, the default is 0.0001
,
close to zero. If nobs < nvars
, the default is 0.01
.
A very small value of
lambda.min.ratio
will lead to a saturated fit in the nobs <
nvars
case. This is undefined for
"binomial"
and "multinomial"
models, and glmnet
will exit gracefully when the percentage deviance explained is almost
1.
A user supplied lambda
sequence. Typical usage
is to have the
program compute its own lambda
sequence based on
nlambda
and lambda.min.ratio
. Supplying a value of
lambda
overrides this. WARNING: use with care. Do not supply
a single value for lambda
(for predictions after CV use predict()
instead). Supply instead
a decreasing sequence of lambda
values. glmnet
relies
on its warms starts for speed, and its often faster to fit a whole
path than compute a single fit.
Logical flag for x variable standardization, prior to
fitting the model sequence. The coefficients are always returned on
the original scale. Default is standardize=TRUE
.
If variables are in the same units already, you might not wish to
standardize. See details below for y standardization with family="gaussian"
.
Should intercept(s) be fitted (default=TRUE) or set to zero (FALSE)
Limit the maximum number of variables in the
model. Useful for very large nvars
, if a partial path is desired.
Limit the maximum number of variables ever to be nonzero
Vector of lower limits for each coefficient;
default -Inf
. Each
of these must be non-positive. Can be presented as a single value
(which will then be replicated), else a vector of length nvars
Vector of upper limits for each coefficient;
default Inf
. See lower.limits
This is for the family="mgaussian"
family, and allows the user to standardize the response variables
Maximum iteration for TLP.
Tolerance.
An object that inherits from glmnet
.
the call that produced this object
Intercept sequence of length length(lambda)
For "elnet"
, "lognet"
, "fishnet"
and "coxnet"
models, a nvars x
length(lambda)
matrix of coefficients, stored in sparse column
format ("CsparseMatrix"
). For "multnet"
and "mgaussian"
, a list of nc
such
matrices, one for each class.
The actual sequence of lambda
values used.
The fraction of (null) deviance explained (for "elnet"
, this
is the R-square). The deviance calculations incorporate weights if
present in the model. The deviance is defined to be 2*(loglike_sat -
loglike), where loglike_sat is the log-likelihood for the saturated
model (a model with a free parameter per observation). Hence dev.ratio=1-dev/nulldev.
Null deviance (per observation). This is defined to be 2*(loglike_sat -loglike(Null)); The NULL model refers to the intercept model, except for the Cox, where it is the 0 model.
The number of nonzero coefficients for each value of
lambda
.
dimension of coefficient matrix (ices)
number of observations
total passes over the data summed over all lambda values
a logical variable indicating whether an offset was included in the model
error flag, for warnings and errors (largely for internal debugging).
Write something about the details.
Xiaotong Shen , Wei Pan and Yunzhang Zhu (2012) Likelihood-Based Selection and Sharp Parameter Estimation, Journal of the American Statistical Association, 107:497, 223-232
# NOT RUN {
data("QuickStartExample")
fit = glmTLP(x,y, nlambda = 3)
#We set nlambda just to speed it up
# and pass the CRAN check. You should either use
# the default setting or search a larger space.
# }
Run the code above in your browser using DataLab