# stepAIC

##### Choose a model by AIC in a Stepwise Algorithm

Performs stepwise model selection by AIC.

- Keywords
- models

##### Usage

```
stepAIC(object, scope, scale = 0,
direction = c("both", "backward", "forward"),
trace = 1, keep = NULL, steps = 1000, use.start = FALSE,
k = 2, ...)
```

##### Arguments

- object
- an object representing a model of an appropriate class. This is used as the initial model in the stepwise search.
- scope
- defines the range of models examined in the stepwise search.
This should be either a single formula, or a list containing
components
`upper`

and`lower`

, both formulae. See the details for how to specify the formulae and - scale
- used in the definition of the AIC statistic for selecting the models,
currently only for
`lm`

and`aov`

models (see`extra`

- direction
- the mode of stepwise search, can be one of
`"both"`

,`"backward"`

, or`"forward"`

, with a default of`"both"`

. If the`scope`

argument is missing the default for`direction`

is - trace
- if positive, information is printed during the running of
`stepAIC`

. Larger values may give more information on the fitting process. - keep
- a filter function whose input is a fitted model object and the
associated
`AIC`

statistic, and whose output is arbitrary. Typically`keep`

will select a subset of the components of the object and return them. The default - steps
- the maximum number of steps to be considered. The default is 1000 (essentially as many as required). It is typically used to stop the process early.
- use.start
- if true the updated fits are done starting at the linear predictor for
the currently selected model. This may speed up the iterative
calculations for
`glm`

(and other fits), but it can also slow them down.**Not used**in R. - k
- the multiple of the number of degrees of freedom used for the penalty.
Only
`k = 2`

gives the genuine AIC:`k = log(n)`

is sometimes referred to as BIC or SBC. - ...
- any additional arguments to
`extractAIC`

. (None are currently used.)

##### Details

The set of models searched is determined by the `scope`

argument.
The right-hand-side of its `lower`

component is always included
in the model, and right-hand-side of the model is included in the
`upper`

component. If `scope`

is a single formula, it
specifies the `upper`

component, and the `lower`

model is
empty. If `scope`

is missing, the initial model is used as the
`upper`

model.

Models specified by `scope`

can be templates to update
`object`

as used by `update.formula`

.

There is a potential problem in using `glm`

fits with a
variable `scale`

, as in that case the deviance is not simply
related to the maximized log-likelihood. The `glm`

method for
`extractAIC`

makes the
appropriate adjustment for a `gaussian`

family, but may need to be
amended for other cases. (The `binomial`

and `poisson`

families have fixed `scale`

by default and do not correspond
to a particular maximum-likelihood problem for variable `scale`

.)

Where a conventional deviance exists (e.g. for `lm`

, `aov`

and `glm`

fits) this is quoted in the analysis of variance table:
it is the *unscaled* deviance.

##### Value

- the stepwise-selected model is returned, with up to two additional
components. There is an
`"anova"`

component corresponding to the steps taken in the search, as well as a`"keep"`

component if the`keep=`

argument was supplied in the call. The`"Resid. Dev"`

column of the analysis of deviance table refers to a constant minus twice the maximized log likelihood: it will be a deviance only in cases where a saturated model is well-defined (thus excluding`lm`

,`aov`

and`survreg`

fits, for example).

##### Note

The model fitting must apply the models to the same dataset. This may
be a problem if there are missing values and an `na.action`

other than
`na.fail`

is used (as is the default in R).
We suggest you remove the missing values first.

##### References

Venables, W. N. and Ripley, B. D. (2002)
*Modern Applied Statistics with S.* Fourth edition. Springer.

##### See Also

##### Examples

```
quine.hi <- aov(log(Days + 2.5) ~ .^4, quine)
quine.nxt <- update(quine.hi, . ~ . - Eth:Sex:Age:Lrn)
quine.stp <- stepAIC(quine.nxt,
scope = list(upper = ~Eth*Sex*Age*Lrn, lower = ~1),
trace = FALSE)
quine.stp$anova
cpus1 <- cpus
attach(cpus)
for(v in names(cpus)[2:7])
cpus1[[v]] <- cut(cpus[[v]], unique(quantile(cpus[[v]])),
include.lowest = TRUE)
detach()
cpus0 <- cpus1[, 2:8] # excludes names, authors' predictions
cpus.samp <- sample(1:209, 100)
cpus.lm <- lm(log10(perf) ~ ., data = cpus1[cpus.samp,2:8])
cpus.lm2 <- stepAIC(cpus.lm, trace = FALSE)
cpus.lm2$anova
example(birthwt)
birthwt.glm <- glm(low ~ ., family = binomial, data = bwt)
birthwt.step <- stepAIC(birthwt.glm, trace = FALSE)
birthwt.step$anova
birthwt.step2 <- stepAIC(birthwt.glm, ~ .^2 + I(scale(age)^2)
+ I(scale(lwt)^2), trace = FALSE)
birthwt.step2$anova
quine.nb <- glm.nb(Days ~ .^4, data = quine)
quine.nb2 <- stepAIC(quine.nb)
quine.nb2$anova
```

*Documentation reproduced from package MASS, version 7.3-20, License: GPL-2 | GPL-3*