# step

##### Choose a model by AIC in a Stepwise Algorithm

Select a formula-based model by AIC.

- Keywords
- models

##### Usage

```
step(object, scope, scale = 0,
direction = c("both", "backward", "forward"),
trace = 1, keep = NULL, steps = 1000, k = 2, …)
```

##### Arguments

- object
an object representing a model of an appropriate class (mainly

`"lm"`

and`"glm"`

). This is used as the initial model in the stepwise search.- scope
defines the range of models examined in the stepwise search. This should be either a single formula, or a list containing components

`upper`

and`lower`

, both formulae. See the details for how to specify the formulae and how they are used.- scale
used in the definition of the AIC statistic for selecting the models, currently only for

`lm`

,`aov`

and`glm`

models. The default value,`0`

, indicates the scale should be estimated: see`extractAIC`

.- direction
the mode of stepwise search, can be one of

`"both"`

,`"backward"`

, or`"forward"`

, with a default of`"both"`

. If the`scope`

argument is missing the default for`direction`

is`"backward"`

. Values can be abbreviated.- trace
if positive, information is printed during the running of

`step`

. Larger values may give more detailed information.- keep
a filter function whose input is a fitted model object and the associated

`AIC`

statistic, and whose output is arbitrary. Typically`keep`

will select a subset of the components of the object and return them. The default is not to keep anything.- steps
the maximum number of steps to be considered. The default is 1000 (essentially as many as required). It is typically used to stop the process early.

- k
the multiple of the number of degrees of freedom used for the penalty. Only

`k = 2`

gives the genuine AIC:`k = log(n)`

is sometimes referred to as BIC or SBC.- …
any additional arguments to

`extractAIC`

.

##### Details

`step`

uses `add1`

and `drop1`

repeatedly; it will work for any method for which they work, and that
is determined by having a valid method for `extractAIC`

.
When the additive constant can be chosen so that AIC is equal to
Mallows' \(C_p\), this is done and the tables are labelled
appropriately.

The set of models searched is determined by the `scope`

argument.
The right-hand-side of its `lower`

component is always included
in the model, and right-hand-side of the model is included in the
`upper`

component. If `scope`

is a single formula, it
specifies the `upper`

component, and the `lower`

model is
empty. If `scope`

is missing, the initial model is used as the
`upper`

model.

Models specified by `scope`

can be templates to update
`object`

as used by `update.formula`

. So using
`.`

in a `scope`

formula means ‘what is
already there’, with `.^2`

indicating all interactions of
existing terms.

There is a potential problem in using `glm`

fits with a
variable `scale`

, as in that case the deviance is not simply
related to the maximized log-likelihood. The `"glm"`

method for
function `extractAIC`

makes the
appropriate adjustment for a `gaussian`

family, but may need to be
amended for other cases. (The `binomial`

and `poisson`

families have fixed `scale`

by default and do not correspond
to a particular maximum-likelihood problem for variable `scale`

.)

##### Value

the stepwise-selected model is returned, with up to two additional
components. There is an `"anova"`

component corresponding to the
steps taken in the search, as well as a `"keep"`

component if the
`keep=`

argument was supplied in the call. The
`"Resid. Dev"`

column of the analysis of deviance table refers
to a constant minus twice the maximized log likelihood: it will be a
deviance only in cases where a saturated model is well-defined
(thus excluding `lm`

, `aov`

and `survreg`

fits,
for example).

##### Note

This function differs considerably from the function in S, which uses a number of approximations and does not in general compute the correct AIC.

This is a minimal implementation. Use `stepAIC`

in package MASS for a wider range of object classes.

##### Warning

The model fitting must apply the models to the same dataset. This
may be a problem if there are missing values and R's default of
`na.action = na.omit`

is used. We suggest you remove the
missing values first.

Calls to the function `nobs`

are used to check that the
number of observations involved in the fitting process remains unchanged.

##### References

Hastie, T. J. and Pregibon, D. (1992)
*Generalized linear models.*
Chapter 6 of *Statistical Models in S*
eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.

Venables, W. N. and Ripley, B. D. (2002)
*Modern Applied Statistics with S.*
New York: Springer (4th ed).

##### See Also

##### Examples

`library(stats)`

```
# NOT RUN {
## following on from example(lm)
utils::example("lm", echo = FALSE)
step(lm.D9)
summary(lm1 <- lm(Fertility ~ ., data = swiss))
slm1 <- step(lm1)
summary(slm1)
slm1$anova
# }
```

*Documentation reproduced from package stats, version 3.6.1, License: Part of R 3.6.1*