Learn R Programming

pedometrics (version 0.6-3)

buildMS: Build a series of linear models using automated variable selection

Description

This function allows building a series of linear models (lm) using one or more automated variable selection implemented in function stepVIF and stepAIC.

Usage

buildMS(formula, data, vif = FALSE, vif.threshold = 10,
  vif.verbose = FALSE, aic = FALSE, aic.direction = "both",
  aic.trace = FALSE, aic.steps = 5000, ...)

Arguments

formula
A list containing one or several model formulas (a symbolic description of the model to be fitted).
data
Data frame containing the variables in the model formulas.
vif
Logical for performing backward variable selection using the Variance-Inflation Factor (VIF). Defaults to VIF = FALSE.
vif.threshold
Numeric value setting the maximum acceptable VIF value. Defaults to vif.threshold = 10.
vif.verbose
Logical for printing iteration results of backward variable selection using the VIF. Defaults to vif.verbose = FALSE.
aic
Logical for performing variable selection using Akaike Information Criterion (AIC). Defaults to aic = FALSE.
aic.direction
Character string setting the direction of variable selection when using AIC. Available options are "both", "forward", and "backward". Defaults to aic.direction = "both".
aic.trace
Logical for printing iteration results of variable selection using the AIC. Defaults to aic.trace = FALSE.
aic.steps
Integer value setting the maximum number of steps to be considered for variable selection using the AIC. Defaults to aic.steps = 5000.
...
Further arguments passed to the function stepAIC.

Value

  • A list containing the fitted linear models.

TODO

Add option to set the order in which stepAIC and stepVIF are run.

Details

This function was devised to deal with a list of linear model formulas. The main objective is to bring together several functions commonly used when building linear models, such as automated variable selection. In the current implementation, variable selection can be done using stepVIF or stepAIC or both. stepVIF is a backward variable selection procedure, while stepAIC supports backward, forward, and bidirectional variable selection. For more information about these functions, please visit their respective help pages.

An important feature of buildMS is that it records the initial number of candidate predictor variables and observations offered to the model, and adds this information as an attribute to the final selected model. Such feature was included because variable selection procedures result biased linear models (too optimistic), and the effective number of degrees of freedom is close to the number of candidate predictor variables initially offered to the model (Harrell, 2001). With the initial number of candidate predictor variables and observations offered to the model, one can calculate penalized or adjusted measures of model performance. For models built using builtMS, this can be done using statsMS.

Some important details should be clear when using buildMS:

  1. this function was originaly devised to deal with a list of formulas, but can also be used with a single formula;
  2. in the current implementation,stepVIFruns beforestepAIC;
  3. function arguments imported fromstepAICandstepVIFwere named as in the original functions, and received a prefix (aicorvif) to help the user identifying which function is affected by a given argument without having to go check the documentation.

References

Harrell, F. E. (2001) Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis. First edition. New York: Springer.

Venables, W. N. and Ripley, B. D. (2002) Modern applied statistics with S. Fourth edition. New York: Springer.

See Also

stepAIC, stepVIF, statsMS.

Examples

Run this code
# based on the second example of function stepAIC
require(MASS)
cpus1 <- cpus
for(v in names(cpus)[2:7])
  cpus1[[v]] <- cut(cpus[[v]], unique(stats::quantile(cpus[[v]])),
                    include.lowest = TRUE)
cpus0 <- cpus1[, 2:8]  # excludes names, authors' predictions
cpus.samp <- sample(1:209, 100)
cpus.form <- list(formula(log10(perf) ~ syct + mmin + mmax + cach + chmin +
                  chmax + perf),
                  formula(log10(perf) ~ syct + mmin + cach + chmin + chmax),
                  formula(log10(perf) ~ mmax + cach + chmin + chmax + perf))
data <- cpus1[cpus.samp,2:8]
cpus.ms <- buildMS(cpus.form, data, vif = TRUE, aic = TRUE)

Run the code above in your browser using DataLab