tvcolmm: Tree-based varying coefficient regression based on ordinal and nominal two-stage linear mixed models.

Description

The tvcolmm function implements the tree-based longitudinal varying coefficient regression algorithm proposed in Buergin and Ritschard (2015a). The algorithm approximates varying fixed coefficients in the cumulative logit mixed model by a (multivariate) piecewise constant function using recursive partitioning, i.e., it estimates the fixed effect component of the model separately for strata of the value space of partitioning variables.

Usage

tvcolmm(formula, data, family = cumulative(),  weights, subset, offset, na.action = na.omit,  control = tvcolmm_control(), ...)
tvcolmm_control(alpha = 0.05, bonferroni = TRUE, minsize = 50, maxnomsplit = 5, maxordsplit = 9, maxnumsplit = 9, fast = TRUE, trim = 0.1, estfun.args = list(), nimpute = 5, seed = NULL, ...)

Arguments

formula

a symbolic description of the model to fit, e.g., y ~ -1 + vc(z1, ..., zL, by = x1 + ... + xP, intercept = TRUE) + re(1|id) where vc term specifies the varying fixed coefficients. Only one such vc term is allowed with tvcolmm (in contrast to commandtvcglm where multiple vc terms can be specified). The above example formula removes the global intercepts and adds locally varying intercepts, by adding a -1 term and specfiying

intercept
    = TRUE

in the vc term. If varying intercepts are desired, we recommend to always remove the global intercepts. For more details on the formula specification, see olmm and vcrpart-formula.

family

the model family. An object of class family.olmm.

data

a data frame containing the variables in the model.

weights

an optional numeric vector of weights to be used in the fitting process.

subset

an optional logical or integer vector specifying a subset of 'data' to be used in the fitting process.

offset

this can be used to specify an a priori known component to be included in the linear predictor during fitting.

na.action

a function that indicates what should happen if data contain NAs. The default na.action = na.omit is listwise deletion, i.e., observations with missings on any variable are dropped. See na.action.

control

a list with control parameters as returned by tvcolmm_control.

alpha

numeric significance threshold between 0 and 1. A node is splitted when the smallest (possibly Bonferroni-corrected) $p$ value for any coefficient constancy test in the current step falls below alpha.

bonferroni

logical. Indicates if and how $p$-values of coefficient constancy tests must be Bonferroni corrected. See details.

minsize

numeric scalar. The minimum sum of weights in terminal nodes.

maxnomsplit, maxordsplit, maxnumsplit

integer scalars for split candidate reduction. See tvcm_control.

fast

logical scalar. Whether the approximative model should be used to search for the next split. See tvcm_control.

trim

numeric between 0 and 1. Specifies the trimming parameter in coefficient constancy tests for continuous partitioning variables. See also the argument from of function supLM in package strucchange.

estfun.args

list of arguments to be passed to gefp.olmm. See details.

nimpute

a positive integer scalar. The number of times coefficient constancy tests should be repeated in each iteration. See details.

seed

an integer specifying which seed should be set at the beginning.

...

additional arguments passed to the fitting function fit or to tvcm_control.

Value

tvcm

Details

The tvcolmm function iterates the following steps:

Fit the current mixed model

y ~ Node:x1 + ... + Node:xP + re(1 + w1 + ... |id) with olmm, where Node is a categorical variable with terminal node labels 1, ..., M.

Test the constancy of the fixed effects

Node:x1,
    ...

, separately for each moderator z1, ..., zL in each node 1, ..., M. This yields L times M (possibly Bonferroni corrected) $p$-values for rejecting coefficient constancy.

If the minimum $p$-value is smaller than alpha, then select the node and the variable corresponding to the minimum $p$-value. Search and incorporate the optimal among the candidate splits in the selected node and variable by exhaustive likelihood search.

Else if minimum $p$-value is larger than alpha, stop the algorithm and return the current model.

The implemented coefficient constancy tests used for node and variable selection (step 2) are based on the M-fluctuation tests of Zeileis and Hornik (2007), using the observation scores of the fitted mixed model. The observation scores can be extracted by estfun.olmm for models fitted with olmm. To deal with intra-individual correlations between such observation scores, the estfun.olmm function decorrelates the observation scores. In cases of unbalanced data, the pre-decorrelation method requires imputation. nimpute gives the number of times the coefficient constancy tests are repeated in each iteration. The final $p$-values are then the averages of the repetations.

The algorithm combines the splitting technique of Zeileis (2008) with the technique of Hajjem et. al (2011) and Sela and Simonoff (2012) to incorporate regression trees into mixed models.

For the exhaustive search, the algorithm implements a number of split point reduction methods to decrease the computational complexity. See the arguments maxnomsplit, maxordsplit and maxnumsplit. By default, the algorithm also uses the approximative search model approach proposed in Buergin and Ritschard (2014c). To disable this option to use the original algorithm, set fast = FALSE in tvcolmm_control.

Special attention is given to varying intercepts, i.e. the terms that account for the direct effects of the moderators. A common specification is

y ~ -1 + vc(z1, ..., zL, by = x1 + ... + xP, intercept = TRUE) + re(1 + w1 + ... |id)

Doing so replaces the globale intercept by local intercepts. As mentioned, if a varying intercepts are desired, we recommend to always remove the global intercept.

References

Zeileis, A., T. Hothorn, and K. Hornik (2008). Model-Based Recursive Partitioning. Journal of Computational and Graphical Statistics, 17(2), 492--514.

Zeileis A., Hornik K. (2007), Generalized M-Fluctuation Tests for Parameter Instability, Statistica Neerlandica, 61(4), 488--508. Buergin R. and Ritschard G. (2015a), Tree-Based Varying Coefficient Regression for Longitudinal Ordinal Responses. Computational Statistics & Data Analysis. Forthcoming. Sela R. and J. S. Simonoff (2012). RE-EM trees: A Data Mining Approach for Longitudinal and Clustered data, Machine Learning 86(2), 169--207. A. Hajjem, F. Bellavance and D. Larocque (2011), Mixed Effects Regression Trees for Clustered Data, Statistics & Probability Letters 81(4), 451--459.

Examples

Run this code

## ------------------------------------------------------------------- # 
## Example 1: Moderated effect effect of unemployment
##
## Here we fit a varying coefficient ordinal linear mixed on the 
## synthetic ordinal longitudinal data 'unemp'. The interest is whether 
## the effect of unemployment 'UNEMP' on happiness 'GHQL' is moderated 
## by 'AGE', 'FISIT', 'GENDER' and 'UEREGION'. 'FISIT' is the only true  
## moderator. For the the partitioning we coefficient constancy tests,
## as described in Buergin and Ritschard (2014a)
## ------------------------------------------------------------------- #

data(unemp)

## fit the model
model.UE <-
  tvcolmm(GHQL ~ -1 + 
          vc(AGE, FISIT, GENDER, UEREGION, by = UNEMP, intercept = TRUE) +
          re(1|PID), data = unemp)

## diagnosis
plot(model.UE, "coef")
summary(model.UE)
splitpath(model.UE, steps = 1, details = TRUE)

Run the code above in your browser using DataLab