lmSelect_fit: Best-subset regression

Description

Low-level interface to best-variable-subset selection in ordinary linear regression.

Usage

lmSelect_fit(x, y, weights = NULL, offset = NULL, include = NULL,
             exclude = NULL, penalty = "BIC", tolerance = 0,
             nbest = 1, ..., pradius = NULL)

Arguments

double[,]---the model matrix

double[]---the model response

weights

double[]---the model weights

offset

double[]---the model offset

include

logical[], integer[], character[]---the regressors to force in

exclude

logical[], integer[], character[]---the regressors to force out

penalty

double, character, "function"---the penalty per model parameter

tolerance

double---the approximation tolerance

nbest

integer---the number of best subsets

...

ignored

pradius

integer---the preordering radius

Value

A list with the following components:

NOBS

integer---number of observations in model (before weights processing)

nobs

integer---number of observations in model (after weights processing)

nvar

integer---number of regressors in model

weights

double[]---model weights

intercept

logical---is TRUE if model contains an intercept term, FALSE otherwise

include

logical[]---regressors forced into the regression

exclude

logical[]---regressors forced out of the regression

size

integer[]---subset sizes

information criterion

tolerance

double---approximation tolerance

nbest

integer---number of best subsets

submodel

"data.frame"---submodel information

subset

"data.frame"---selected subsets

Details

The best variable-subset model is determined, where the "best" model is the one with the lowest information criterion value. The information criterion belongs to the AIC family.

The regression data is specified with the x, y, weights, and offset parameters. See lm.fit() for further details.

To force regressors into or out of the regression, a list of regressors can be passed as an argument to the include or exclude parameters, respectively.

The information criterion is specified with the penalty parameter. Accepted values are "AIC", "BIC", or a "numeric" value representing the penalty-per-model-parameter. A custom selection criterion may be specified by passing an R function as an argument. The expected signature is function (size, rss), where size is the number of predictors (including the intercept, if any), and rss is the residual sum of squares. The function must be non-decreasing in both parameters.

An approximation tolerance can be specified to speed up the search.

The number of returned submodels is determined by the nbest parameter.

The preordering radius is given with the pradius parameter.

References

Hofmann M, Gatu C, Kontoghiorghes EJ, Colubi A, Zeileis A (2020). lmSubsets: Exact variable-subset selection in linear regression for R. Journal of Statistical Software, 93, 1--21. 10.18637/jss.v093.i03.

Examples

Run this code

# NOT RUN {
data("AirPollution", package = "lmSubsets")

x <- as.matrix(AirPollution[, names(AirPollution) != "mortality"])
y <-           AirPollution[, names(AirPollution) == "mortality"]

f <- lmSelect_fit(x, y)
f
# }

Run the code above in your browser using DataLab