lmSelect
Best-Subset Regression
Best-subset regression for ordinary linear models.
- Keywords
- regression
Usage
lmSelect(formula, …)# S3 method for default
lmSelect(formula, data, subset, weights, na.action, model = TRUE,
x = FALSE, y = FALSE, contrasts = NULL, offset, …)
# S3 method for matrix
lmSelect(formula, y, intercept = TRUE, …)
# S3 method for lmSubsets
lmSelect(formula, penalty = "BIC", …)
lmSelect_fit(x, y, weights = NULL, offset = NULL, include = NULL,
exclude = NULL, penalty = "BIC", tolerance = 0,
nbest = 1, …, pradius = NULL)
Arguments
- formula, data, subset, weights, na.action, model, x, y, contrasts, offset
Standard formula interface.
- intercept
Include intercept.
- include, exclude
Force regressors in or out.
- penalty
Penalty per parameter.
- tolerance
Approximation tolerance.
- nbest
Number of best subsets.
- …
Forwarded to
lmSelect_fit
.- pradius
Preordering radius.
Details
The lmSelect
generic provides a convenient interface for best
variable-subset selection in linear regression: The nbest
best
-- according to an information criterion of the AIC family -- subset
models are returned.
The information criterion is specified with the penalty
parameter. Accepted values are "AIC"
, "BIC"
, or a
numeric
value representing the penalty per model parameter (see
AIC
).
A custom selection criterion may be specified by passing an R function
as the penalty
argument. The expected signature is
function(size, rss)
, where size
is the number of
predictors (including intercept, if any), and rss
the residual
sum of squares. The function must be non-decreasing in both
parameters.
A low-level matrix interface is provided by lmSelect_fit
.
See lmSubsets
for further information.
Value
An object of class "lmSelect"
, i.e., a list with the following
components:
Number of observations, of variables.
TRUE
if model has intercept term;
FALSE
otherwise.
Included, excluded variables.
Subset sizes.
Approximation tolerance.
Number of best subsets.
Submodel information.
Selected variables.
Further components include call, na.action, weights, offset, contrasts, xlevels, terms, mf, x, and y. See lm for more information.
References
Hofmann M, Gatu C, Kontoghiorghes EJ, Colubi A, Zeileis A (2020). lmSubsets: Exact Variable-Subset Selection in Linear Regression for R. Journal of Statistical Software. 93, 1--21. doi:10.18637/jss.v093.i03.
See Also
Examples
# NOT RUN {
## load data (with logs for relative potentials)
data("AirPollution", package = "lmSubsets")
###################
## basic usage ##
###################
## fit 20 best subsets (BIC)
lm_best <- lmSelect(mortality ~ ., data = AirPollution, nbest = 20)
lm_best
## equivalent to:
# }
# NOT RUN {
lm_all <- lmSubsets(mortality ~ ., data = AirPollution, nbest = 20)
lm_best <- lmSelect(lm_all)
# }
# NOT RUN {
## summary statistics
summary(lm_best)
## visualize
plot(lm_best)
########################
## custom criterion ##
########################
## the same as above, but with a custom criterion:
M <- nrow(AirPollution)
ll <- function (rss) {
-M/2 * (log(2 * pi) - log(M) + log(rss) + 1)
}
aic <- function (size, rss, k = 2) {
-2 * ll(rss) + k * (size + 1)
}
bic <- function (size, rss) {
aic(size, rss, k = log(M))
}
lm_cust <- lmSelect(mortality ~ ., data = AirPollution,
penalty = bic, nbest = 20)
lm_cust
# }