Our estimation procedures makes use of the matrix formulation introduced by
LeSage2008;textualspflow and further developed by
Dargel2021;textualspflow to reduce the computational
effort and memory requirements.
The estimation procedure can be adjusted through the estimation_method
argument in spflow_control().
Maximum likelihood estimation (MLE)
Maximum likelihood estimation is the default estimation procedure.
The matrix form estimation in the framework of this model was first
developed by LeSage2008;textualspflow and then improved by
Dargel2021;textualspflow.
Spatial two-stage least squares (S2SLS)
The S2SLS estimator is an adaptation of the one proposed by
Kelejian1998;textualspflow, to the case of origin-destination
flows, with up to three neighborhood matrices
Dargel2021;textualspflow.
A similar estimation is done by Tamesue2016;textualspflow.
The user can activate the S2SLS estimation via the flow_control argument
using the input spflow_control(estimation_method = "s2sls").
Bayesian Markov Chain Monte Carlo (MCMC)
The MCMC estimator is based on the ideas of
LeSage2009;textualspflow and incorporates the improvements
proposed in Dargel2021;textualspflow.
The estimation is based on a tuned Metropolis-Hastings sampler for the
auto-regressive parameters, and for the remaining parameters it uses Gibbs
sampling.
The routine uses 5500 iterations of the sampling procedure and considers the
first 2500 as burn-in period.
The user can activate the S2SLS estimation via the flow_control argument
using the input spflow_control(estimation_method = "mcmc").
Formula interface
The function offers a formula interface adapted to spatial interaction
models, which has the following structure:
Y ~ O_(X1) + D_(X2) + I_(X3) + G_(X4)
This structure reflects the different data sources involved in such a model.
On the left hand side there is the independent variable Y which
corresponds to the vector of flows.
On the right hand side we have all the explanatory variables.
The functions O_(...) and D_(...) indicate which variables are used as
characteristics of the origins and destinations respectively.
Similarly, I_(...) indicates variables that should be used for the
intra-regional parameters.
Finally, G_(...) declares which variables describe origin-destination
pairs, which most frequently will include a measure of distance.
All the declared variables must be available in the provided
sp_multi_network() object, which gathers information on the origins and
destinations (inside sp_network_nodes() objects), as well as the
information on the origin-destination pairs (inside a sp_network_pair()
object).
Using the short notation Y ~ . is possible and will be interpreted as
usual, in the sense that we use all variables that are available for each
data source.
Also mixed formulas, such as Y ~ . + G_(log(X4) + 1), are possible.
When the dot shortcut is combined with explicit declaration, it will only be
used for the non declared data sources.
The following examples illustrate this behaviour.
Formula interface (examples)
Consider the case where we have the flow vector Y and the distance vector
DIST available as information on origin-destination pairs.
In addition we have the explanatory variables X1, X2 and X3 which
describe the regions that are at the same time origins and destinations of
the flows.
For this example the four formulas below are equivalent and make use of all
explanatory variables X1, X2 and X3 for origins, destinations and
intra-regional observations.
Now if we only want to use X1 for the intra-regional model we can do the
following (again all four options below are equivalent).
Y ~ . + I_(X1)
Y ~ . + I_(X1) + G_(DIST)
Y ~ X1 + X2 + X3 + I_(X1) + G_(DIST)
Y ~ D_(X1 + X2 + X3) + O_(X1 + X2 + X3) + I_(X1) + G_(DIST)
This behaviour is easily combined with transformation of variables as the
two equivalent options below illustrate.