step
,
which used to be a generic, so must be invoked with the full name.step.gam(object, scope, scale, direction, trace, keep, steps, parallel, …)
gam
or any of it's inheritants.
~1+ Income + log(Income) + s(Income)
.
This means that Income
could either appear not at all, linearly, linearly in its logarithm, or as a smooth function estimated nonparametrically. A 1
in the formula allows the additional option of leaving the term out of the model entirely.
Every term in the model is described by such a term formula, and the
final model is built up by selecting a component from each
formula.As an alternative more convenient for big models, each list can have
instead of a formula a
character vector corresponding to the candidates for that term. Thus we
could have c("1","x","s(x,df=5")
rather than ~1+x+s(x,df=5)
.
The supplied model object
is used as the starting model, and hence there is the requirement that one term from each of the term formulas be present in formula(object)
. This also implies that any terms in formula(object)
not contained in any of the term formulas will be forced
to be present in every model considered.
The function gam.scope
is helpful for generating the scope
argument for a large model.
"both"
, "backward"
, or "forward"
, with a default of "both"
. If scope
is missing, the default for direction
is "both".
TRUE
(the default), information is printed during the running
of step.gam()
. This is an encouraging choice in general, since
step.gam()
can take some time to compute either for large models
or when called with an an extensive scope=
argument. A simple one
line model summary is printed for each model selected. This argument can
also be given as the binary 0
or 1
. A value trace=2
gives a more verbose trace.
gam
object, and
anything else passed via …, and whose output is arbitrary. Typically keep()
will select a subset of the components of the object and return them. The default is not to keep anything.
TRUE
, use parallel foreach
to fit each
trial run.
Must register parallel before hand, such as doMC
or others.
See the example below.keep
"anova"
component corresponding to the steps taken in the search, as well as a "keep"
component if the keep=
argument was supplied in the call. We describe the most general setup, when direction = "both"
.
At any stage there is a current model comprising a single term from each of the term formulas supplied in the scope=
argument.
A series of models is fitted, each corrresponding to a formula obtained by moving each of the terms one step up or down in its regimen, relative to the formula of the current model.
If the current value for any term is at either of the extreme ends of its regimen, only one rather than two steps can be considered.
So if there are p
term formulas, at most 2*p - 1
models are considered.
A record is kept of all the models ever visited (hence the -1
above), to avoid repetition.
Once each of these models has been fit, the "best" model
in terms of the AIC statistic is selected and defines the step.
The entire process is repeated until either the maximum number of steps has been used, or until the AIC criterion can not be decreased by any of the eligible steps.gam.scope
,step
,glm
, gam
, drop1
, add1
, anova.gam
data(gam.data)
gam.object <- gam(y~x+z, data=gam.data)
step.object <-step.gam(gam.object, scope=list("x"=~1+x+s(x,4)+s(x,6)+s(x,12),"z"=~1+z+s(z,4)))
## Not run: ------------------------------------
# # Parallel
# require(doMC)
# registerDoMC(cores=2)
# step.gam(gam.object, scope=list("x"=~1+x+s(x,4)+s(x,6)+s(x,12),"z"=~1+z+s(z,4)),parallel=TRUE)
## ---------------------------------------------
Run the code above in your browser using DataLab