# lmSelect

##### Best-Subset Regression

Best-subset regression for ordinary linear models.

- Keywords
- regression

##### Usage

`lmSelect(formula, …)`# S3 method for default
lmSelect(formula, data, subset, weights, na.action, model = TRUE,
x = FALSE, y = FALSE, contrasts = NULL, offset, …)

# S3 method for matrix
lmSelect(formula, y, intercept = TRUE, …)

# S3 method for lmSubsets
lmSelect(formula, penalty = "BIC", …)

lmSelect_fit(x, y, weights = NULL, offset = NULL, include = NULL,
exclude = NULL, penalty = "BIC", tolerance = 0,
nbest = 1, …, pradius = NULL)

##### Arguments

- formula, data, subset, weights, na.action, model, x, y, contrasts, offset
Standard formula interface.

- intercept
Include intercept.

- include, exclude
Force regressors in or out.

- penalty
Penalty per parameter.

- tolerance
Approximation tolerance.

- nbest
Number of best subsets.

- …
Forwarded to

`lmSelect_fit`

.- pradius
Preordering radius.

##### Details

The `lmSelect`

generic provides a convenient interface for best
variable-subset selection in linear regression: The `nbest`

best
-- according to an information criterion of the AIC family -- subset
models are returned.

The information criterion is specified with the `penalty`

parameter. Accepted values are `"AIC"`

, `"BIC"`

, or a
`numeric`

value representing the penalty per model parameter (see
`AIC`

).

A custom selection criterion may be specified by passing an R function
as the `penalty`

argument. The expected signature is
`function(size, rss)`

, where `size`

is the number of
predictors (including intercept, if any), and `rss`

the residual
sum of squares. The function must be non-decreasing in both
parameters.

A low-level matrix interface is provided by `lmSelect_fit`

.

See `lmSubsets`

for further information.

##### Value

An object of class `"lmSelect"`

, i.e., a list with the following
components:

Number of observations, of variables.

`TRUE`

if model has intercept term;
`FALSE`

otherwise.

Included, excluded variables.

Subset sizes.

Approximation tolerance.

Number of best subsets.

Submodel information.

Selected variables.

Further components include call, na.action, weights, offset, contrasts, xlevels, terms, mf, x, and y. See lm for more information.

##### References

Hofmann M, Gatu C, Kontoghiorghes EJ, Colubi A, Zeileis A (2020).
lmSubsets: Exact Variable-Subset Selection in Linear Regression for
R. *Journal of Statistical Software*. **93**, 1--21.
doi:10.18637/jss.v093.i03.

##### See Also

##### Examples

```
# NOT RUN {
## load data (with logs for relative potentials)
data("AirPollution", package = "lmSubsets")
###################
## basic usage ##
###################
## fit 20 best subsets (BIC)
lm_best <- lmSelect(mortality ~ ., data = AirPollution, nbest = 20)
lm_best
## equivalent to:
# }
# NOT RUN {
lm_all <- lmSubsets(mortality ~ ., data = AirPollution, nbest = 20)
lm_best <- lmSelect(lm_all)
# }
# NOT RUN {
## summary statistics
summary(lm_best)
## visualize
plot(lm_best)
########################
## custom criterion ##
########################
## the same as above, but with a custom criterion:
M <- nrow(AirPollution)
ll <- function (rss) {
-M/2 * (log(2 * pi) - log(M) + log(rss) + 1)
}
aic <- function (size, rss, k = 2) {
-2 * ll(rss) + k * (size + 1)
}
bic <- function (size, rss) {
aic(size, rss, k = log(M))
}
lm_cust <- lmSelect(mortality ~ ., data = AirPollution,
penalty = bic, nbest = 20)
lm_cust
# }
```

*Documentation reproduced from package lmSubsets, version 0.5-1, License: GPL (>= 3)*