Low-level interface to best-variable-subset selection in ordinary linear regression.
lmSelect_fit(x, y, weights = NULL, offset = NULL, include = NULL,
exclude = NULL, penalty = "BIC", tolerance = 0,
nbest = 1, ..., pradius = NULL)
double[,]
---the model matrix
double[]
---the model response
double[]
---the model weights
double[]
---the model offset
logical[]
, integer[]
,
character[]
---the regressors to force in
logical[]
, integer[]
,
character[]
---the regressors to force out
double
, character
,
"function"
---the penalty per model parameter
double
---the approximation tolerance
integer
---the number of best subsets
ignored
integer
---the preordering radius
A list
with the following components:
integer
---number of observations in model (before
weights
processing)
integer
---number of observations in model (after
weights
processing)
integer
---number of regressors in model
double[]
---model weights
logical
---is TRUE
if model contains an
intercept term, FALSE
otherwise
logical[]
---regressors forced into the
regression
logical[]
---regressors forced out of the
regression
integer[]
---subset sizes
information criterion
double
---approximation tolerance
integer
---number of best subsets
"data.frame"
---submodel information
"data.frame"
---selected subsets
The best variable-subset model is determined, where the "best" model is the one with the lowest information criterion value. The information criterion belongs to the AIC family.
The regression data is specified with the x
, y
,
weights
, and offset
parameters. See
lm.fit()
for further details.
To force regressors into or out of the regression, a list of
regressors can be passed as an argument to the include
or
exclude
parameters, respectively.
The information criterion is specified with the penalty
parameter. Accepted values are "AIC"
, "BIC"
, or a
"numeric"
value representing the penalty-per-model-parameter.
A custom selection criterion may be specified by passing an R
function as an argument. The expected signature is function
(size, rss)
, where size
is the number of predictors (including
the intercept, if any), and rss
is the residual sum of squares.
The function must be non-decreasing in both parameters.
An approximation tolerance
can be specified to speed up the
search.
The number of returned submodels is determined by the nbest
parameter.
The preordering radius is given with the pradius
parameter.
Hofmann M, Gatu C, Kontoghiorghes EJ, Colubi A, Zeileis A (2020). lmSubsets: Exact variable-subset selection in linear regression for R. Journal of Statistical Software, 93, 1--21. 10.18637/jss.v093.i03.
lmSelect()
for the high-level
interface
lmSubsets_fit()
for all-subsets
regression
# NOT RUN {
data("AirPollution", package = "lmSubsets")
x <- as.matrix(AirPollution[, names(AirPollution) != "mortality"])
y <- AirPollution[, names(AirPollution) == "mortality"]
f <- lmSelect_fit(x, y)
f
# }
Run the code above in your browser using DataLab