lagsarlmtree: Spatial Lag Model Trees

Description

Model-based recursive partitioning based on linear regression adjusting for a (global) spatial simultaneous autoregressive lag.

Usage

lagsarlmtree(formula, data, listw = NULL, method = "eigen",
  zero.policy = NULL, interval = NULL, control = list(),
  rhowystart = NULL, abstol = 0.001, maxit = 100, 
  dfsplit = TRUE, verbose = FALSE, plot = FALSE, ...)

Arguments

formula

formula specifying the response variable and regressors and partitioning variables, respectively. For details see below.

data

data.frame to be used for estimating the model tree.

listw

a weights object for the spatial lag part of the model.

method

"eigen" (default) - the Jacobian is computed as the product of (1 - rho*eigenvalue) using eigenw, and "spam" or "Matrix_J" for strictly symmetric weights lists of styles "B" and "C", or made symmetric by similarity (Ord, 1975, Appendix C) if possible for styles "W" and "S", using code from the spam or Matrix packages to calculate the determinant; “Matrix” and “spam_update” provide updating Cholesky decomposition methods; "LU" provides an alternative sparse matrix decomposition approach. In addition, there are "Chebyshev" and Monte Carlo "MC" approximate log-determinant methods; the Smirnov/Anselin (2009) trace approximation is available as "moments". Three methods: "SE_classic", "SE_whichMin", and "SE_interp" are provided experimentally, the first to attempt to emulate the behaviour of Spatial Econometrics toolbox ML fitting functions. All use grids of log determinant values, and the latter two attempt to ameliorate some features of "SE_classic".

zero.policy

default NULL, use global option value; if TRUE assign zero to the lagged value of zones without neighbours, if FALSE (default) assign NA - causing lagsarlm() to terminate with an error

interval

default is NULL, search interval for autoregressive parameter

control

list of extra control arguments - see lagsarlm

rhowystart

numeric. A vector of length nrow(data), to be used as an offset in estimation of the first tree. NULL by default, which results in an initialization with the root model (without partitioning).

abstol

numeric. The convergence criterion used for estimation of the model. When the difference in log-likelihoods of the model from two consecutive iterations is smaller than abstol, estimation of the model tree has converged.

maxit

numeric. The maximum number of iterations to be performed in estimation of the model tree.

dfsplit

logical or numeric. as.integer(dfsplit) is the degrees of freedom per selected split employed when extracting the log-likelihood.

verbose

Should the log-likelihood value of the estimated model be printed for every iteration of the estimation?

plot

Should the tree be plotted at every iteration of the estimation? Note that selecting this option slows down execution of the function.

…

Additional arguments to be passed to lmtree(). See mob_control documentation for details.

Value

The function returns a list with the following objects:

formula

The formula as specified with the formula argument.

call

the matched call.

tree

The final lmtree.

lagsarlm

The final lagsarlm model.

data

The dataset specified with the data argument including added auxiliary variables .rhowy and .tree from the last iteration.

nobs

Number of observations.

loglik

The log-likelihood value of the last iteration.

Degrees of freedom.

dfsplit

degrees of freedom per selected split as specified with the dfsplit argument.

iterations

The number of iterations used to estimate the lagsarlmtree.

maxit

The maximum number of iterations specified with the maxit argument.

rhowystart

Offset in estimation of the first tree as specified in the rhowystart argument.

abstol

The prespecified value for the change in log-likelihood to evaluate convergence, as specified with the abstol argument.

listw

The listw object used.

mob.control

A list containing control parameters passed to lmtree(), as specified with ….

Details

Spatial lag trees learn a tree where each terminal node is associated with different regression coefficients while adjusting for a (global) spatial simultaneous autoregressive lag. This allows for detection of subgroup-specific coefficients with respect to selected covariates, while adjusting for spatial correlations in the data. The estimation algorithm iterates between (1) estimation of the tree given an offset of the spatial lag effect, and (2) estimation of the spatial lag model given the tree structure.

The code is still under development and might change in future versions.

References

Wagner M, Zeileis A (2019). Heterogeneity and Spatial Dependence of Regional Growth in the EU: A Recursive Partitioning Approach. German Economic Review, 20(1), 67--82. 10.1111/geer.12146 https://eeecon.uibk.ac.at/~zeileis/papers/Wagner+Zeileis-2019.pdf

Examples

Run this code

# NOT RUN {
## data and spatial weights
data("GrowthNUTS2", package = "lagsarlmtree")
data("WeightsNUTS2", package = "lagsarlmtree")

## spatial lag model tree
system.time(tr <- lagsarlmtree(ggdpcap ~ gdpcap0 + shgfcf + shsh + shsm |
    gdpcap0 + accessrail + accessroad + capital + regboarder + regcoast + regobj1 + cee + piigs,
  data = GrowthNUTS2, listw = WeightsNUTS2$invw,
  minsize = 12, alpha = 0.05))
print(tr)
plot(tr, tp_args = list(which = 1))

## query coefficients
coef(tr, model = "tree")
coef(tr, model = "rho")
coef(tr, model = "all")
system.time({
ev <- eigenw(WeightsNUTS2$invw)
tr1 <- lagsarlmtree(ggdpcap ~ gdpcap0 + shgfcf + shsh + shsm |
    gdpcap0 + accessrail + accessroad + capital + regboarder + regcoast + regobj1 + cee + piigs,
  data = GrowthNUTS2, listw = WeightsNUTS2$invw, method = "eigen",
  control = list(pre_eig = ev), minsize = 12, alpha = 0.05)
})
coef(tr1, model = "rho")
# }

Run the code above in your browser using DataLab