lmSubsets (version 0.4)

lmSelect: Best-Subset Regression

Description

Best-subset regression for ordinary linear models.

Usage

lmSelect(formula, …)

# S3 method for default lmSelect(formula, data, subset, weights, na.action, model = TRUE, x = FALSE, y = FALSE, contrasts = NULL, offset, …)

# S3 method for matrix lmSelect(formula, y, intercept = TRUE, …)

# S3 method for lmSubsets lmSelect(formula, penalty = "BIC", …)

lmSelect_fit(x, y, weights = NULL, offset = NULL, include = NULL, exclude = NULL, penalty = "BIC", tolerance = 0, nbest = 1, …, pradius = NULL)

Arguments

formula, data, subset, weights, na.action, model, x, y, contrasts, offset

Standard formula interface.

intercept

Include intercept.

include, exclude

Force regressors in or out.

penalty

Penalty per parameter.

tolerance

Approximation tolerance.

nbest

Number of best subsets.

Forwarded to lmSelect_fit.

pradius

Preordering radius.

Value

An object of class "lmSelect", i.e., a list with the following components:

nobs, nvar

Number of observations, of variables.

intercept

TRUE if model has intercept term; FALSE otherwise.

include, exclude

Included, excluded variables.

size

Subset sizes.

tolerance

Approximation tolerance.

nbest

Number of best subsets.

submodel

Submodel information.

subset

Selected variables.

Further components include call, na.action, weights, offset, contrasts, xlevels, terms, mf, x, and y. See lm for more information.

Details

The lmSelect generic provides a convenient interface for best variable-subset selection in linear regression: The nbest best -- according to an information criterion of the AIC family -- subset models are returned.

The information criterion is specified with the penalty parameter. Accepted values are "AIC", "BIC", or a numeric value representing the penalty per model parameter (see AIC).

A custom selection criterion may be specified by passing an R function as the penalty argument. The expected signature is function(size, rss), where size is the number of predictors (including intercept, if any), and rss the residual sum of squares. The function must be non-decreasing in both parameters.

A low-level matrix interface is provided by lmSelect_fit.

See lmSubsets for further information.

References

Hofmann M, Gatu C, Kontoghiorghes EJ, Colubi A, Zeileis A (2019). lmSubsets: Exact Variable-Subset Selection in Linear Regression for R. Journal of Statistical Software. In press.

See Also

lmSubsets, summary, methods.

Examples

Run this code
# NOT RUN {
## load data (with logs for relative potentials)
data("AirPollution", package = "lmSubsets")


###################
##  basic usage  ##
###################

## fit 20 best subsets (BIC)
lm_best <- lmSelect(mortality ~ ., data = AirPollution, nbest = 20)
lm_best

## equivalent to:
# }
# NOT RUN {
lm_all <- lmSubsets(mortality ~ ., data = AirPollution, nbest = 20)
lm_best <- lmSelect(lm_all)
# }
# NOT RUN {
## summary statistics
summary(lm_best)

## visualize
plot(lm_best)


########################
##  custom criterion  ##
########################

## the same as above, but with a custom criterion:
M <- nrow(AirPollution)

ll <- function (rss) {
  -M/2 * (log(2 * pi) - log(M) + log(rss) + 1)
}

aic <- function (size, rss, k = 2) {
  -2 * ll(rss) + k * (size + 1)
}

bic <- function (size, rss) {
  aic(size, rss, k = log(M))
}

lm_cust <- lmSelect(mortality ~ ., data = AirPollution,
                    penalty = bic, nbest = 20)
lm_cust
# }

Run the code above in your browser using DataCamp Workspace