lmSubsets
All-Subsets Regression
All-subsets regression for linear models estimated by ordinary least squares (OLS).
- Keywords
- regression
Usage
lmSubsets(formula, …)# S3 method for default
lmSubsets(formula, data, subset, weights, na.action, model = TRUE,
x = FALSE, y = FALSE, contrasts = NULL, offset, …)
# S3 method for matrix
lmSubsets(formula, y, intercept = TRUE, …)
lmSubsets_fit(x, y, weights = NULL, offset = NULL, include = NULL,
exclude = NULL, nmin = NULL, nmax = NULL,
tolerance = 0, nbest = 1, …, pradius = NULL)
Arguments
- formula, data, subset, weights, na.action, model, x, y, contrasts, offset
Standard formula interface.
- intercept
Include intercept.
- include, exclude
Force regressors in or out.
- nmin, nmax
Minimum and maximum number of regressors.
- tolerance
Approximation tolerance (vector).
- nbest
Number of best subsets.
- …
Forwarded to
lmSubsets.default
andlmSubsets_fit
.- pradius
Preordering radius.
Details
The lmSubsets
generic provides various methods to conveniently
specify the regressor and response variables. The standard
formula
interface (see lm
) can be used, or
the information can be extracted from an already fitted "lm"
object. The regressor matrix and response variable can also be passed
in directly (see Examples).
The call is forwarded to lmSubsets_fit
, which provides a
low-level matrix interface.
The nbest
best subset models for every subset size are
computed, where the "best" models are the models with the lowest
residual sum of squares (RSS). The scope of the search can be limited
to a range of subset sizes by setting nmin
and nmax
. A
tolerance vector (expanded if necessary) may be specified to speed up
the search, where tolerance[j]
is the tolerance applied to
subset models of size j
.
By way of include
and exclude
, variables may be forced
in to or out of the regression, respectively.
The extent to which variables are preordered is controlled with the
pradius
parameter.
A set of standard extractor functions for fitted model objects is
available for objects of class "lmSubsets"
. See
methods
for more details.
The summary
method can be called to obtain summary statistics.
Value
An object of class "lmSubsets"
, i.e., a list with the
following components:
Number of observations, of variables.
TRUE
if model has intercept term;
FALSE
otherwise.
Included, excluded regressors.
Subset sizes.
Approximation tolerance (vector).
Number of best subsets.
Submodel information.
Selected variables.
Further components include call, na.action, weights, offset, contrasts, xlevels, terms, mf, x, and y. See lm for more information.
References
Hofmann M, Gatu C, Kontoghiorghes EJ, Colubi A, Zeileis A (2020). lmSubsets: Exact Variable-Subset Selection in Linear Regression for R. Journal of Statistical Software. 93, 1--21. doi:10.18637/jss.v093.i03.
Hofmann M, Gatu C, Kontoghiorghes EJ (2007). Efficient Algorithms for Computing the Best Subset Regression Models for Large-Scale Problems. Computational Statistics \& Data Analysis, 52, 16--29. doi:10.1016/j.csda.2007.03.017.
Gatu C, Kontoghiorghes EJ (2006). Branch-and-Bound Algorithms for Computing the Best Subset Regression Models. Journal of Computational and Graphical Statistics, 15, 139--156. doi:10.1198/106186006x100290.
See Also
Examples
# NOT RUN {
## load data (with logs for relative potentials)
data("AirPollution", package = "lmSubsets")
###################
## basic usage ##
###################
## canonical example: fit all subsets
lm_all <- lmSubsets(mortality ~ ., data = AirPollution, nbest = 5)
lm_all
## plot RSS and BIC
plot(lm_all)
## summary statistics
summary(lm_all)
############################
## forced in-/exclusion ##
############################
lm_force <- lmSubsets(lm_all, include = c("nox", "so2"),
exclude = "whitecollar")
lm_force
########################
## matrix interface ##
########################
## same as above
x <- as.matrix(AirPollution)
lm_mat <- lmSubsets(x, y = "mortality")
lm_mat
# }