vcrpart (version 1.0-3)

tvcm: Tree-based varying coefficient regression models

Description

tvcm is the general implementation for tree-based varying coefficient regression. It may be used to combine the two different algorithms tvcolmm and tvcglm.

Usage

tvcm(formula, data, fit, family,
     weights, subset, offset, na.action = na.omit,
     control = tvcm_control(), fitargs, ...)

Arguments

formula

a symbolic description of the model to fit, e.g.,

y ~ vc(z1, z2) + vc(z1, z2, by = x)

where vc specifies the varying coefficients. See vcrpart-formula.

fit

a character string or a function that specifies the fitting function, e.g., olmm or glm.

family

the model family, e.g., an object of class family.olmm or family.

data

a data frame containing the variables in the model.

weights

an optional numeric vector of weights to be used in the fitting process.

subset

an optional logical or integer vector specifying a subset of 'data' to be used in the fitting process.

offset

this can be used to specify an a priori known component to be included in the linear predictor during fitting.

na.action

a function that indicates what should happen if data contain NAs. The default na.action = na.omit is listwise deletion, i.e., observations with missings on any variable are dropped. See na.action.

control

a list with control parameters as returned by tvcm_control.

fitargs

additional arguments passed to the fitting function fit.

additional arguments passed to the fitting function fit. Note that using the fitargs argument is the preferred way to for this.

Value

An object of class tvcm. The tvcm class itself is based on the party class of the partykit package. The most important slots are:

node

an object of class partynode.

data

a data.frame. The model frame with all variables for partitioning.

fitted

an optional data.frame containing at least the fitted terminal node identifiers as element (fitted). In addition, weights may be contained as element (weights) and responses as (response).

info

additional information including control, model and data (all untransformed data, without missings).

Details

TVCM partitioning works as follows: In each iteration we fit the current model and select a binary split for one of the current terminal nodes. The selection requires 4 decisions: the vc term, the node, the variable and the cutpoint in the selected variable. The algorithm starts with \(M_k = 1\) node for each of the \(K\) vc terms and iterates until the criteria defined by control are reached, see tvcm_control. For the specific criteria for the split selection, see tvcolmm and tvcglm.

Alternative tree-based algorithm to tvcm are the MOB (Zeileis et al., 2008) and the PartReg (Wang and Hastie, 2014) algorithms. The MOB algorithm is implemented by the mob function in the packages party and partykit. For smoothing splines and kernel regression approaches to varying coefficients, see the packages mgcv, svcm,mboost or np.

The tvcm function builds on the software infrastructure of the partykit package. The authors are grateful for these codes.

References

Zeileis, A., T. Hothorn, and K. Hornik (2008). Model-Based Recursive Partitioning. Journal of Computational and Graphical Statistics, 17(2), 492--514.

Wang, J. C. and T. Hastie (2014), Boosted Varying-Coefficient Regression Models for Product Demand Prediction, Journal of Computational and Graphical Statistics, 23(2), 361--382.

Hothorn, T. and A. Zeileis (2014). partykit: A Modular Toolkit for Recursive Partytioning in R. In Working Papers in Economics and Statistics, Research Platform Empirical and Experimental Economics, Number 2014-10. Universitaet Innsbruck.

Buergin R. and Ritschard G. (2015), Tree-Based Varying Coefficient Regression for Longitudinal Ordinal Responses. Computational Statistics & Data Analysis, 86, 65--80.

Buergin, R. A. (2015b). Tree-based methods for moderated regression with application to longitudinal data. PhD thesis. University of Geneva.

Buergin, R. and G. Ritschard (2017), Coefficient-Wise Tree-Based Varying Coefficient Regression with vcrpart. Journal of Statistical Software, 80(6), 1--33.

See Also

tvcolmm, tvcglm, tvcm_control, tvcm-methods, tvcm-plot, tvcm-assessment

Examples

Run this code
# NOT RUN {
## ------------------------------------------------------------------- #  
## Example 1: Moderated effect of education on poverty
##
## See the help of 'tvcglm'.
## ------------------------------------------------------------------- #

data(poverty)
poverty$EduHigh <- 1 * (poverty$Edu == "high")

## fit the model
model.Pov <-
  tvcm(Poor ~ -1 +  vc(CivStat) + vc(CivStat, by = EduHigh) + NChild, 
         family = binomial(), data = poverty, subset = 1:200,
         control = tvcm_control(verbose = TRUE, papply = "lapply",
           folds = folds_control(K = 1, type = "subsampling", seed = 7)))

## diagnosis
plot(model.Pov, "cv")
plot(model.Pov, "coef")
summary(model.Pov)
splitpath(model.Pov, steps = 1:3)
prunepath(model.Pov, steps = 1)


## ------------------------------------------------------------------- # 
## Example 2: Moderated effect effect of unemployment
##
## See the help of 'tvcolmm'.
## ------------------------------------------------------------------- #

data(unemp)

## fit the model
model.UE <-
  tvcm(GHQL ~ -1 + 
          vc(AGE, FISIT, GENDER, UEREGION, by = UNEMP, intercept = TRUE) +
          re(1|PID),
       data = unemp, control = tvcm_control(sctest = TRUE),
       family = cumulative())

## diagnosis (no cross-validation was performed since 'sctest = TRUE')
plot(model.UE, "coef")
summary(model.UE)
splitpath(model.UE, steps = 1, details = TRUE)


# }

Run the code above in your browser using DataCamp Workspace