structree: Tree-Structured Clustering

Description

Fusion of categories of ordinal or nominal predictors or fusion of measurement units by tree-structured clustering.

Usage

structree(formula, data, family = gaussian, stop_criterion = c("AIC",
  "BIC", "CV", "pvalue"), splits_max = NULL, fold = 5, alpha = 0.05,
  grid_value = NULL, min_border = NULL, ridge = FALSE,
  constant_covs = FALSE, trace = TRUE, plot = TRUE, k = 10,
  weights = NULL, offset = NULL, ...)
# S3 method for structree
print(x, ...)
# S3 method for structree
coef(object, ...)

Arguments

formula

Object of class formula: a symbolic description of the model to be fitted. See detail.

data

Data.frame of class data.frame containing the variables of the model.

family

a description of the error distribution and link function to be used in the model. This can be a character string naming a family function, a family function or the result of a call to a family function. See family for details of family functions.

stop_criterion

Criterion to determine the optimal number of splits in the tree component of the model; one out of "AIC", "BIC", "CV" and "pvalue".

splits_max

Maximal number of splits in the tree component.

fold

Number of folds; only for stop criterion "CV".

alpha

Significance level; only for stop criterion "pvalue".

grid_value

An optional parameter; grid_value is a scalar giving the minimal distance between two adjacent observation units that are used as candidates for splitting; only for repeated measurements.

min_border

An optional parameter; min_border is a integer giving the minimal size of the outer nodes of the tree; only for repeated measurements.

ridge

If true, a small ridge penalty is added to obtain the order of measurement units; only for repeated measurements.

constant_covs

Must be set to true, if constant covariates are available; only for repeated measurments (currently only available for Gaussian response).

trace

If true, information about the estimation progress is printed.

plot

If true, the smooth components of the model are plottet; only for categorical predictors.

Dimension of the B-spline basis that is used to fit smooth components. For details see s; only for categorical predictors.

weights

An optional vector of prior weights to be used in the fitting process; see also glm.

offset

An a priori known component to be included in the linear predictor during fitting; see also glm.

...

Further arguments passed to or from other methods.

x, object

Object of class "structree".

Value

Object of class "structree". An object of class "structree" is a list containing the following components:

coefs_end

all coefficients of the estimated model

partitions

list of matrices containing the partitions of the predictors in the tree component including all iterations

beta_hat

list of matrices with the fitted coefficients in the tree component including all iterations

which_opt

number of the optimal model (total number of splits-1)

opts

number of splits per predictor in the tree component

order

list of ordered split-points of the predictors in the tree component

tune_values

value of the stopping criterion that determine the optimal model

group_ID

list of the group IDs for each observations

coefs_group

list of coefficients of the estimated model

Response vector

DM_kov

Design matrix

Details

A typical formula has the form response ~ predictors, where response is the name of the response variable and predictors is a series of terms that specify the predictor of the model.

For an ordinal or nominal predictors z one has to enter tr(x) into the formula.

For smooth components x one has to enter s(x) into the formula; currently not implemented for repeated measurements.

For fixed effects z of observation units u one has to enter tr(z|u) into the formula. An unit-specific intercept is specified by tr(1|u).

The framework only allows for categorical predictors or observations units in the tree component, but not both. All other predictors with a linear term are entered as usual by x1+...+xp.

References

Tutz, Gerhard and Berger, Moritz (2018): Tree-structured modelling of categorical predictors in regression, Advances in Data Analysis and Classification 12(3), 737-758.

Berger, Moritz and Tutz, Gerhard (2018): Tree-structured clustering in fixed effects models, Journal of Computational and Graphical Statistics 27(2), 380-392.

Examples

Run this code

# NOT RUN {
data(rent)

# }
# NOT RUN {
mod <- structree(nmqm~tr(bez)+tr(bj)+tr(rooms)+badkach0,data=rent,
                 family=gaussian,stop_criterion="CV")

print(mod)
coef(mod)
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab