- x
Input matrix, of dimension \(n \times p\); each row is an observation
vector and each column is a predictor/feature/variable.
Can be in sparse matrix format (inherit from class "dgCMatrix" in package Matrix).
- y
The response variable, of n observations.
For family = "binomial" should have two levels.
For family="poisson", y should be a vector with positive integer.
For family = "cox", y should be a Surv object returned
by the survival package (recommended) or
a two-column matrix with columns named "time" and "status".
For family = "mgaussian", y should be a matrix of quantitative responses.
For family = "multinomial" or "ordinal", y should be a factor of at least three levels.
Note that, for either "binomial", "ordinal" or "multinomial",
if y is presented as a numerical vector, it will be coerced into a factor.
- family
One of the following models:
"gaussian" (continuous response),
"binomial" (binary response),
"poisson" (non-negative count),
"cox" (left-censored response),
"mgaussian" (multivariate continuous response),
"multinomial" (multi-class response),
"ordinal" (multi-class ordinal response),
"gamma" (positive continuous response).
Depending on the response. Any unambiguous substring can be given.
- tune.path
The method to be used to select the optimal support size. For
tune.path = "sequence", we solve the best subset selection problem for each size in support.size.
For tune.path = "gsection", we solve the best subset selection problem with support size ranged in gs.range,
where the specific support size to be considered is determined by golden section.
- tune.type
The type of criterion for choosing the support size.
Available options are "gic", "ebic", "bic", "aic" and "cv".
Default is "gic".
- weight
Observation weights. When weight = NULL,
we set weight = 1 for each observation as default.
- normalize
Options for normalization.
normalize = 0 for no normalization.
normalize = 1 for subtracting the means of the columns of x and y, and also
normalizing the columns of x to have \(\sqrt n\) norm.
normalize = 2 for subtracting the mean of columns of x and
scaling the columns of x to have \(\sqrt n\) norm.
normalize = 3 for scaling the columns of x to have \(\sqrt n\) norm.
If normalize = NULL, normalize will be set 1 for "gaussian" and "mgaussian",
3 for "cox". Default is normalize = NULL.
- fit.intercept
A boolean value indicating whether to fit an intercept.
We assume the data has been centered if fit.intercept = FALSE.
Default: fit.intercept = TRUE.
- beta.low
A single value specifying the lower bound of \(\beta\). Default is -.Machine$double.xmax.
- beta.high
A single value specifying the upper bound of \(\beta\). Default is .Machine$double.xmax.
- c.max
an integer splicing size. Default is: c.max = 2.
- support.size
An integer vector representing the alternative support sizes.
Only used for tune.path = "sequence". Default is 0:min(n, round(n/(log(log(n))log(p)))).
- gs.range
A integer vector with two elements.
The first element is the minimum model size considered by golden-section,
the later one is the maximum one. Default is gs.range = c(1, min(n, round(n/(log(log(n))log(p))))).
- lambda
A single lambda value for regularized best subset selection. Default is 0.
- always.include
An integer vector containing the indexes of variables that should always be included in the model.
- group.index
A vector of integers indicating the which group each variable is in.
For variables in the same group, they should be located in adjacent columns of x
and their corresponding index in group.index should be the same.
Denote the first group as 1, the second 2, etc.
If you do not fit a model with a group structure,
please set group.index = NULL (the default).
- init.active.set
A vector of integers indicating the initial active set.
Default: init.active.set = NULL.
- splicing.type
Optional type for splicing.
If splicing.type = 1, the number of variables to be spliced is
c.max, ..., 1; if splicing.type = 2,
the number of variables to be spliced is c.max, c.max/2, ..., 1.
(Default: splicing.type = 2.)
- max.splicing.iter
The maximum number of performing splicing algorithm.
In most of the case, only a few times of splicing iteration can guarantee the convergence.
Default is max.splicing.iter = 20.
- screening.num
An integer number. Preserve screening.num number of predictors with the largest
marginal maximum likelihood estimator before running algorithm.
- important.search
An integer number indicating the number of
important variables to be splicing.
When important.search \(\ll\) p variables,
it would greatly reduce runtimes. Default: important.search = 128.
- warm.start
Whether to use the last solution as a warm start. Default is warm.start = TRUE.
- nfolds
The number of folds in cross-validation. Default is nfolds = 5.
- foldid
an optional integer vector of values between 1, ..., nfolds identifying what fold each observation is in.
The default foldid = NULL would generate a random foldid.
- cov.update
A logical value only used for family = "gaussian". If cov.update = TRUE,
use a covariance-based implementation; otherwise, a naive implementation.
The naive method is more computational efficient than covariance-based method when \(p >> n\) and important.search is much large than its default value.
Default: cov.update = FALSE.
- newton
A character specify the Newton's method for fitting generalized linear models,
it should be either newton = "exact" or newton = "approx".
If newton = "exact", then the exact hessian is used,
while newton = "approx" uses diagonal entry of the hessian,
and can be faster (especially when family = "cox").
- newton.thresh
a numeric value for controlling positive convergence tolerance.
The Newton's iterations converge when \(|dev - dev_{old}|/(|dev| + 0.1)<\) newton.thresh.
- max.newton.iter
a integer giving the maximal number of Newton's iteration iterations.
Default is max.newton.iter = 10 if newton = "exact", and max.newton.iter = 60 if newton = "approx".
- early.stop
A boolean value decide whether early stopping.
If early.stop = TRUE, algorithm will stop if the last tuning value less than the existing one.
Default: early.stop = FALSE.
- ic.scale
A non-negative value used for multiplying the penalty term
in information criterion. Default: ic.scale = 1.
- num.threads
An integer decide the number of threads to be
concurrently used for cross-validation (i.e., tune.type = "cv").
If num.threads = 0, then all of available cores will be used.
Default: num.threads = 0.
- seed
Seed to be used to divide the sample into cross-validation folds.
Default is seed = 1.
- ...
further arguments to be passed to or from methods.
- formula
an object of class "formula":
a symbolic description of the model to be fitted.
The details of model specification are given in the "Details" section of "formula".
- data
a data frame containing the variables in the formula.
- subset
an optional vector specifying a subset of observations to be used.
- na.action
a function which indicates
what should happen when the data contain NAs.
Defaults to getOption("na.action").