Fusion of categories of ordinal or nominal predictors or fusion of measurement units by tree-structured clustering.
structree(formula, data, family = gaussian, stop_criterion = c("AIC",
"BIC", "CV", "pvalue"), splits_max = NULL, fold = 5, alpha = 0.05,
grid_value = NULL, min_border = NULL, ridge = FALSE,
constant_covs = FALSE, trace = TRUE, plot = TRUE, k = 10,
weights = NULL, offset = NULL, ...)# S3 method for structree
print(x, ...)
# S3 method for structree
coef(object, ...)
Object of class formula
: a symbolic description of the model to be fitted. See detail.
Data.frame of class data.frame
containing the variables of the model.
a description of the error distribution and link function to be used in the model.
This can be a character string naming a family function, a family function or the result of a call to a family function.
See family
for details of family functions.
Criterion to determine the optimal number of splits in the tree component of the model;
one out of "AIC"
, "BIC"
, "CV"
and "pvalue"
.
Maximal number of splits in the tree component.
Number of folds; only for stop criterion "CV"
.
Significance level; only for stop criterion "pvalue"
.
An optional parameter; grid_value
is a scalar giving the minimal distance between
two adjacent observation units that are used as candidates for splitting; only for repeated measurements.
An optional parameter; min_border
is a integer giving the minimal size of the outer
nodes of the tree; only for repeated measurements.
If true, a small ridge penalty is added to obtain the order of measurement units; only for repeated measurements.
Must be set to true, if constant covariates are available; only for repeated measurments (currently only available for Gaussian response).
If true, information about the estimation progress is printed.
If true, the smooth components of the model are plottet; only for categorical predictors.
Dimension of the B-spline basis that is used to fit smooth components. For details see s
;
only for categorical predictors.
An optional vector of prior weights to be used in the fitting process; see also glm
.
An a priori known component to be included in the linear predictor during fitting; see also glm
.
Further arguments passed to or from other methods.
Object of class "structree"
.
Object of class "structree"
.
An object of class "structree"
is a list containing the following components:
all coefficients of the estimated model
list of matrices containing the partitions of the predictors in the tree component including all iterations
list of matrices with the fitted coefficients in the tree component including all iterations
number of the optimal model (total number of splits-1)
number of splits per predictor in the tree component
list of ordered split-points of the predictors in the tree component
value of the stopping criterion that determine the optimal model
list of the group IDs for each observations
list of coefficients of the estimated model
Response vector
Design matrix
A typical formula has the form response ~ predictors
, where response
is the name of the response variable
and predictors
is a series of terms that specify the predictor of the model.
For an ordinal or nominal predictors z one has to enter tr(x)
into the formula.
For smooth components x one has to enter s(x)
into the formula; currently not implemented for repeated measurements.
For fixed effects z of observation units u one has to enter tr(z|u)
into the formula.
An unit-specific intercept is specified by tr(1|u)
.
The framework only allows for categorical predictors or observations units in the tree component, but not both.
All other predictors with a linear term are entered as usual by x1+...+xp
.
Tutz, Gerhard and Berger, Moritz (2018): Tree-structured modelling of categorical predictors in regression, Advances in Data Analysis and Classification 12(3), 737-758.
Berger, Moritz and Tutz, Gerhard (2018): Tree-structured clustering in fixed effects models, Journal of Computational and Graphical Statistics 27(2), 380-392.
# NOT RUN {
data(rent)
# }
# NOT RUN {
mod <- structree(nmqm~tr(bez)+tr(bj)+tr(rooms)+badkach0,data=rent,
family=gaussian,stop_criterion="CV")
print(mod)
coef(mod)
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab