- x
Input matrix, of dimension \(n \times p\); each row is an observation
vector and each column is a predictor/feature/variable.
Can be in sparse matrix format (inherit from class "dgCMatrix"
in package Matrix
).
- y
The response variable, of n
observations.
For family = "binomial"
should have two levels.
For family="poisson"
, y
should be a vector with positive integer.
For family = "cox"
, y
should be a Surv
object returned
by the survival
package (recommended) or
a two-column matrix with columns named "time"
and "status"
.
For family = "mgaussian"
, y
should be a matrix of quantitative responses.
For family = "multinomial"
or "ordinal"
, y
should be a factor of at least three levels.
Note that, for either "binomial"
, "ordinal"
or "multinomial"
,
if y is presented as a numerical vector, it will be coerced into a factor.
- family
One of the following models:
"gaussian"
(continuous response),
"binomial"
(binary response),
"poisson"
(non-negative count),
"cox"
(left-censored response),
"mgaussian"
(multivariate continuous response),
"multinomial"
(multi-class response),
"ordinal"
(multi-class ordinal response),
"gamma"
(positive continuous response).
Depending on the response. Any unambiguous substring can be given.
- tune.path
The method to be used to select the optimal support size. For
tune.path = "sequence"
, we solve the best subset selection problem for each size in support.size
.
For tune.path = "gsection"
, we solve the best subset selection problem with support size ranged in gs.range
,
where the specific support size to be considered is determined by golden section.
- tune.type
The type of criterion for choosing the support size.
Available options are "gic"
, "ebic"
, "bic"
, "aic"
and "cv"
.
Default is "gic"
.
- weight
Observation weights. When weight = NULL
,
we set weight = 1
for each observation as default.
- normalize
Options for normalization.
normalize = 0
for no normalization.
normalize = 1
for subtracting the means of the columns of x
and y
, and also
normalizing the columns of x
to have \(\sqrt n\) norm.
normalize = 2
for subtracting the mean of columns of x
and
scaling the columns of x
to have \(\sqrt n\) norm.
normalize = 3
for scaling the columns of x
to have \(\sqrt n\) norm.
If normalize = NULL
, normalize
will be set 1
for "gaussian"
and "mgaussian"
,
3
for "cox"
. Default is normalize = NULL
.
- fit.intercept
A boolean value indicating whether to fit an intercept.
We assume the data has been centered if fit.intercept = FALSE
.
Default: fit.intercept = FALSE
.
- beta.low
A single value specifying the lower bound of \(\beta\). Default is -.Machine$double.xmax
.
- beta.high
A single value specifying the upper bound of \(\beta\). Default is .Machine$double.xmax
.
- c.max
an integer splicing size. Default is: c.max = 2
.
- support.size
An integer vector representing the alternative support sizes.
Only used for tune.path = "sequence"
. Default is 0:min(n, round(n/(log(log(n))log(p))))
.
- gs.range
A integer vector with two elements.
The first element is the minimum model size considered by golden-section,
the later one is the maximum one. Default is gs.range = c(1, min(n, round(n/(log(log(n))log(p)))))
.
- lambda
A single lambda value for regularized best subset selection. Default is 0.
- always.include
An integer vector containing the indexes of variables that should always be included in the model.
- group.index
A vector of integers indicating the which group each variable is in.
For variables in the same group, they should be located in adjacent columns of x
and their corresponding index in group.index
should be the same.
Denote the first group as 1
, the second 2
, etc.
If you do not fit a model with a group structure,
please set group.index = NULL
(the default).
- init.active.set
A vector of integers indicating the initial active set.
Default: init.active.set = NULL
.
- splicing.type
Optional type for splicing.
If splicing.type = 1
, the number of variables to be spliced is
c.max
, ..., 1
; if splicing.type = 2
,
the number of variables to be spliced is c.max
, c.max/2
, ..., 1
.
(Default: splicing.type = 2
.)
- max.splicing.iter
The maximum number of performing splicing algorithm.
In most of the case, only a few times of splicing iteration can guarantee the convergence.
Default is max.splicing.iter = 20
.
- screening.num
An integer number. Preserve screening.num
number of predictors with the largest
marginal maximum likelihood estimator before running algorithm.
- important.search
An integer number indicating the number of
important variables to be splicing.
When important.search
\(\ll\) p
variables,
it would greatly reduce runtimes. Default: important.search = 128
.
- warm.start
Whether to use the last solution as a warm start. Default is warm.start = TRUE
.
- nfolds
The number of folds in cross-validation. Default is nfolds = 5
.
- foldid
an optional integer vector of values between 1, ..., nfolds identifying what fold each observation is in.
The default foldid = NULL
would generate a random foldid.
- cov.update
A logical value only used for family = "gaussian"
. If cov.update = TRUE
,
use a covariance-based implementation; otherwise, a naive implementation.
The naive method is more computational efficient than covariance-based method when \(p >> n\) and important.search
is much large than its default value.
Default: cov.update = FALSE
.
- newton
A character specify the Newton's method for fitting generalized linear models,
it should be either newton = "exact"
or newton = "approx"
.
If newton = "exact"
, then the exact hessian is used,
while newton = "approx"
uses diagonal entry of the hessian,
and can be faster (especially when family = "cox"
).
- newton.thresh
a numeric value for controlling positive convergence tolerance.
The Newton's iterations converge when \(|dev - dev_{old}|/(|dev| + 0.1)<\) newton.thresh
.
- max.newton.iter
a integer giving the maximal number of Newton's iteration iterations.
Default is max.newton.iter = 10
if newton = "exact"
, and max.newton.iter = 60
if newton = "approx"
.
- early.stop
A boolean value decide whether early stopping.
If early.stop = TRUE
, algorithm will stop if the last tuning value less than the existing one.
Default: early.stop = FALSE
.
- ic.scale
A non-negative value used for multiplying the penalty term
in information criterion. Default: ic.scale = 1
.
- num.threads
An integer decide the number of threads to be
concurrently used for cross-validation (i.e., tune.type = "cv"
).
If num.threads = 0
, then all of available cores will be used.
Default: num.threads = 0
.
- seed
Seed to be used to divide the sample into cross-validation folds.
Default is seed = 1
.
- ...
further arguments to be passed to or from methods.
- formula
an object of class "formula
":
a symbolic description of the model to be fitted.
The details of model specification are given in the "Details" section of "formula
".
- data
a data frame containing the variables in the formula
.
- subset
an optional vector specifying a subset of observations to be used.
- na.action
a function which indicates
what should happen when the data contain NA
s.
Defaults to getOption("na.action")
.