Model-based recursive partitioning based on (generalized) linear mixed models.
lmertree(formula, data, weights = NULL, cluster = NULL,
ranefstart = NULL, offset = NULL, joint = TRUE,
abstol = 0.001, maxit = 100, dfsplit = TRUE, verbose = FALSE,
plot = FALSE, REML = TRUE, lmer.control = lmerControl(), ...)glmertree(formula, data, family = "binomial", weights = NULL,
cluster = NULL, ranefstart = NULL, offset = NULL, joint = TRUE,
abstol = 0.001, maxit = 100, dfsplit = TRUE, verbose = FALSE,
plot = FALSE, nAGQ = 1L, glmer.control = glmerControl(), ...)
The function returns a list with the following objects:
The final lmtree
/glmtree
.
The final lmer
random-effects model.
The corresponding random effects of lmer
.
The corresponding VarCorr(lmer)
.
The corresponding attr(VarCorr(lmer), "sc")^2
.
The dataset specified with the data
argument
including added auxiliary variables .ranef
and .tree
from the last iteration.
The log-likelihood value of the last iteration.
The number of iterations used to estimate the lmertree
.
The maximum number of iterations specified with the maxit
argument.
The random effects used as an offset, as specified with
the ranefstart
argument.
The formula as specified with the formula
argument.
The formula as specified with the randomformula
argument.
The prespecified value for the change in log-likelihood to evaluate
convergence, as specified with the abstol
argument.
A list containing control parameters passed to
lmtree()
, as specified with ....
A list containing control parameters passed to
lmer()
, as specified in the lmer.control
argument.
Whether the fixed effects from the tree were (re-)estimated jointly along
with the random effects, specified with the joint
argument.
formula specifying the response variable and a three-part right-hand-side describing the regressors, random effects, and partitioning variables, respectively. For details see below.
data.frame to be used for estimating the model tree.
family specification for glmtree
and glmer
.
See glm
documentation for families.
numeric. An optional numeric vector of weights. Can be a
name of a column in data or a vector of length nrow(data)
.
optional vector of cluster IDs to be employed for clustered
covariances in the parameter stability tests. Can be a name of a column
in data
or a vector of length nrow(data)
. If
cluster = NULL
(the default), observation-level covariances
are employed in the parameter stability tests. If
partitioning variables are measured on the cluster level, this will
likely yield spurious splits, which can be mitigated by specification
of the cluster argument, which results in cluster-level covariances
being employed in the parameter stability tests.
NULL
(the default), TRUE
, or a numeric
vector of length nrow(data)
. Specifies the offset to be used in
estimation of the first tree. NULL
by default, yielding a zero offset
initialization. If ranefstart = TRUE
is specified, the random effects
will be estimated first and the first tree will be grown using the
random-effects predictions as an offset.
optional numeric vector to be included in the linear predictor
with a coeffcient of one. Note that offset
can be a name of a column
in data
or a a numeric vector of length nrow(data)
.
logical. Should the fixed effects from the tree be (re-)estimated jointly along with the random effects?
numeric. The convergence criterion used for estimation of the model.
When the difference in log-likelihoods of the random-effects model from two
consecutive iterations is smaller than abstol
, estimation of the
model tree has converged.
numeric. The maximum number of iterations to be performed in estimation of the model tree.
logical or numeric. as.integer(dfsplit)
is the
degrees of freedom per selected split employed when extracting
the log-likelihood.
Should the log-likelihood value of the estimated random-effects model be printed for every iteration of the estimation?
Should the tree be plotted at every iteration of the estimation? Note that selecting this option slows down execution of the function.
logical scalar. Should the fixef-effects estimates be chosen to
optimize the REML criterion (as opposed to the log-likelihood)? Will be
passed to funtion lmer()
. See lmer
for details.
integer scalar. Specifies the number of points per axis for evaluating
the adaptive Gauss-Hermite approximation to the log-likelihood, to be passed
to function glmer()
. See glmer
for details.
list. An optional list with control
parameters to be passed to lmer()
and glmer()
, respectively.
See lmerControl
for details.
Additional arguments to be passed to lmtree()
or glmtree()
.
See mob_control
documentation for details.
(G)LMM trees learn a tree where each terminal node is associated with different fixed-effects regression coefficients while adjusting for global random effects (such as a random intercept). This allows for detection of subgroups with different fixed-effects parameter estimates, keeping the random effects constant throughout the tree (i.e., random effects are estimated globally). The estimation algorithm iterates between (1) estimation of the tree given an offset of random effects, and (2) estimation of the random effects given the tree structure. See Fokkema et al. (2018) for a detailed introduction.
To specify all variables in the model a formula
such as
y ~ x1 + x2 | random | z1 + z2 + z3
is used, where y
is the
response, x1
and x2
are the regressors in every node of the
tree, random
is the random effects, and z1
to z3
are
the partitioning variables considered for growing the tree. If random
is only a single variable such as id
a random intercept with respect
to id
is used. Alternatively, it may be an explicit random-effects
formula such as (1 | id)
or a more complicated formula such as
((1+time) | id)
. (Note that in the latter two formulas, the brackets
are necessary to protect the pipes in the random-effects formulation.)
In the random-effects model from step (2), two strategies are available:
Either the fitted values from the tree can be supplied as an offset
(joint = FALSE
) so that only the random effects are estimated.
Or the fixed effects are (re-)estimated along with the random effects
using a nesting factor with nodes from the tree (joint = TRUE
).
In the former case, the estimation of each random-effects model is typically
faster, but more iterations are required.
The code is still under development and might change in future versions.
Fokkema M, Smits N, Zeileis A, Hothorn T, Kelderman H (2018). “Detecting Treatment-Subgroup Interactions in Clustered Data with Generalized Linear Mixed-Effects Model Trees”. Behavior Research Methods, 50(5), 2016-2034. https://doi.org/10.3758/s13428-017-0971-x
# \donttest{
## artificial example data
data("DepressionDemo", package = "glmertree")
## fit normal linear regression LMM tree for continuous outcome
lt <- lmertree(depression ~ treatment | cluster | age + anxiety + duration,
data = DepressionDemo)
print(lt)
plot(lt, which = "all") # default behavior, may also be "tree" or "ranef"
coef(lt)
ranef(lt)
predict(lt, type = "response") # default behavior, may also be "node"
predict(lt, re.form = NA) # excludes random effects, see ?lme4::predict.merMod
residuals(lt)
VarCorr(lt) # see lme4::VarCorr
## fit logistic regression GLMM tree for binary outcome
gt <- glmertree(depression_bin ~ treatment | cluster | age + anxiety + duration,
data = DepressionDemo)
print(gt)
plot(gt, which = "all") # default behavior, may also be "tree" or "ranef"
coef(gt)
ranef(gt)
predict(gt, type = "response") # default behavior, may also be "node" or "link"
predict(gt, re.form = NA) # excludes random effects, see ?lme4::predict.merMod
residuals(gt)
VarCorr(gt) # see lme4::VarCorr
# }
Run the code above in your browser using DataCamp Workspace