Learn R Programming

bootStepAIC (version 1.2-0)

boot.stepAIC: Bootstraps the Stepwise Algorithm of stepAIC() for Choosing a Model by AIC

Description

Implements a Bootstrap procedure to investigate the variability of model selection under the stepAIC() stepwise algorithm of package MASS.

Usage

boot.stepAIC(object, data, B = 100, alpha = 0.05, direction = "backward", k = 2, verbose = FALSE, ...)

Arguments

object
an object representing a model of an appropriate class; currently, "lm", "aov", "glm", "negbin", "polr", "survreg", and "coxph" objects are supported.
data
a data.frame or a matrix that contains the response variable and covariates.
B
the number of Bootstrap samples.
alpha
the significance level.
direction
the direction argument of stepAIC().
k
the k argument of stepAIC().
verbose
logical; if TRUE information about the evolution of the procedure is printed in the screen.
...
extra arguments to stepAIC(), e.g., scope.

Value

BootStep with components
Covariates
a numeric matrix containing the percentage of times each variable was selected.
Sign
a numeric matrix containing the percentage of times the regression coefficient of each variable had sign $+$ and $-$.
Significance
a numeric matrix containing the percentage of times the regression coefficient of each variable was significant under the alpha significance level.
OrigModel
a copy of object.
OrigStepAIC
the result of applying stepAIC() in object.
direction
a copy of the direction argument.
k
a copy of the k argument.
BootStepAIC
a list of length B containing the results of stepAIC() for each Bootstrap data-set.

Details

The following procedure is replicated B times:
Step 1:
Simulate a new data-set taking a sample with replacement from the rows of data.

Step 2:
Refit the model using the data-set from Step 1.

Step 3:
For the refitted model of Step 2 run the stepAIC() algorithm.

Summarize the results by counting how many times (out of the B data-sets) each variable was selected, how many times the estimate of the regression coefficient of each variable (out of the times it was selected) it was statistically significant in significance level alpha, and how many times the estimate of the regression coefficient of each variable (out of the times it was selected) changed signs (see also Austin and Tu, 2004).

References

Austin, P. and Tu, J. (2004). Bootstrap methods for developing predictive models, The American Statistician, 58, 131--137.

Venables, W. N. and Ripley, B. D. (2002). Modern Applied Statistics with S, 4th ed. Springer, New York.

See Also

stepAIC in package MASS

Examples

Run this code

## lm() Example ##
n <- 350
x1 <- runif(n, -4, 4)
x2 <- runif(n, -4, 4)
x3 <- runif(n, -4, 4)
x4 <- runif(n, -4, 4)
x5 <- runif(n, -4, 4)
x6 <- runif(n, -4, 4)
x7 <- factor(sample(letters[1:3], n, rep = TRUE))
y <- 5 + 3 * x1 + 2 * x2 - 1.5 * x3 - 0.8 * x4 + rnorm(n, sd = 2.5)
data <- data.frame(y, x1, x2, x3, x4, x5, x6, x7)
rm(n, x1, x2, x3, x4, x5, x6, x7, y)

lmFit <- lm(y ~ (. - x7) * x7, data = data)
boot.stepAIC(lmFit, data)

#####################################################################

## glm() Example ##
n <- 200
x1 <- runif(n, -3, 3)
x2 <- runif(n, -3, 3)
x3 <- runif(n, -3, 3)
x4 <- runif(n, -3, 3)
x5 <- factor(sample(letters[1:2], n, rep = TRUE))
eta <- 0.1 + 1.6 * x1 - 2.5 * as.numeric(as.character(x5) == levels(x5)[1])
y1 <- rbinom(n, 1, plogis(eta))
y2 <- rbinom(n, 1, 0.6)
data <- data.frame(y1, y2, x1, x2, x3, x4, x5)
rm(n, x1, x2, x3, x4, x5, eta, y1, y2)

glmFit1 <- glm(y1 ~ x1 + x2 + x3 + x4 + x5, family = binomial, data = data)
glmFit2 <- glm(y2 ~ x1 + x2 + x3 + x4 + x5, family = binomial, data = data)

boot.stepAIC(glmFit1, data, B = 50)
boot.stepAIC(glmFit2, data, B = 50)

Run the code above in your browser using DataLab