The purpose of the function dgp_spsur
is to generate a random dataset
with the dimensions and spatial structure decided by the user. This function may be useful in
pure simulation experiments or with the aim of showing specific properties and characteristics
of a spatial SUR dataset and inferential procedures related to them.
The user of dgp_spsur
should think in terms of a Monte Carlo experiment.
The arguments of the funtion specify the dimensions of the dataset to be generated, the spatial
mechanism underlying the data, the intensity of the SUR structure among the equations
and the values of the parameters to be used to obtain the simulated data, which includes the error terms,
the regressors and the explained variables.
dgp_spsur <- function(Sigma, Tm = 1, G, N, Betas, Thetas = NULL, durbin = FALSE, rho = NULL, lambda = NULL, p = NULL, W = NULL, X = NULL, pdfU = "nvrnorm", pdfX = "nvrnorm")
dgp_spsur(Sigma, Tm = 1, G, N, Betas, Thetas = NULL, rho = NULL,
lambda = NULL, p = NULL, W = NULL, X = NULL, pdfU = "nvrnorm",
pdfX = "nvrnorm")
Covariance matrix between the G equations of the SUR model. This matrix should be definite positive and the user must check for that.
Number of time periods.Default = 1
Number of equations.
Number of cross-section or spatial units
A row vector of order \((1xP)\) showing the values for the beta coefficients. The first \(P_{1}\) terms correspond to the first equation (where the first element is the intercept), the second \(P_{2}\) terms to the coefficients of the second equation and so on.
Values for the \(\theta\) coefficients in the G equations of the model,
when the type of spatial SUR model to be simulated is a "slx", "sdm" or "sdem".
Thetas is a row vector of order \(1xPTheta\), where \(PThetas=p-G\); let us note
that the intercept cannot appear among the spatial lags of the regressors. The first \(1xKTheta_{1}\)
terms correspond to the first equation, the second \(1xPTheta_{2}\) terms correspond to the
second equation, and so on. Default = NULL
.
Values of the coefficients \(\rho_{g}; g=1,2,..., G\) related to the spatial lag of
the errors in the G equations. If \(rho\) is an scalar and there are G equations
in the model, the same value will be used for all the equations. If \(rho\) is a row vector,
of order (1xG), the function dgp_spsur
will use these values,
one for each equation of the spatial errors. Default = NULL
.
Values of the coefficients \(\lambda_{g}; g=1,2,..., G\) related to the spatial lag of
the explained variable of the g-th equation. If \(lambda\) is an scalar and there are G equations
in the model, the same value will be used for all the equations. If \(lambda\) is a row vector,
of order (1xG), the function dgp_spsur
will use these values,
one for each equation. Default = NULL
.
Number of regressors by equation, including the intercept. p can be a row vector of order (1xG), if the number of regressors is not the same for all the equations, or a scalar, if the G equations have the same number of regressors.
A spatial weighting matrix of order (NxN), assumed to be the same for all equations and time periods.
This argument tells the function dgp_spsur
which X matrix should be used to
generate the SUR dataset. If X is different from NULL
, {dgp_spsur}
will upload the X matrix selected in this argument. Note that the X must be consistent
with the dimensions of the model. If X is NULL
, dgp_spsur
will
generate the desired matrix of regressors from a multivariate Normal distribution with mean value zero and
identity \((PxP)\) covariance matrix. As an alternative, the user may change this probability distribution
function to the uniform case, \(U(0,1)\), through the argument pdfX. Default = NULL
.
Multivariate probability distribution function, mdf, from which the values of the error terms will
be drawn. The covariance matrix is the \(\Sigma\) matrix specificied by the user in the argument Sigma.
The funtion dgp_spsur
provides two mdf, the multivariate Normal, which is the default,
and the log-Normal distribution funtion which means just exponenciate the sampling drawn form a \(N(0,\Sigma)\)
distribution. Default = "nvrnorm"
.
Multivariate probability distribution function, mdf, from which the values of the regressors
will be drawn. The regressors are assumed to be independent. dgp_spsur
provides
two mdf, the multivariate Normal, which is the default, and the uniform in the interval \(U[0,1]\), using the
dunif function, dunif
, from the stats package. Default = "nvrnorm"
.
A list with a vector \(Y\) of order (TmNGx1) with the values generated for the explained variable in the G equations of the SUR and a matrix \(XX\) of order ((TmNGxsum(p)), with the values generated for the regressors of the SUR, including an intercept for each equation.
The purpose of the function dgp_spsur
is to generate random datasets, of a SUR
nature, with the spatial structure decided by the user. The function requires certain information to be
supplied externally because, in fact, dgp_spsur
constitutes a Data Generation
Process, DGP. The following aspects should be addressed:
The user must define the dimensions of the dataset, that is, number of equations, G, number of time periods, Tm, and number of cross-sectional units, N.
Then, the user must choose the type of spatial structure desired for the model from among the list of candidates of "sim", "slx", "slm", "sem", "sdm", "sdem" or "sarar"; the default is the "sim" specification which does not have spatial structure. The decision is made implicitly, just omiting the specification of the spatial parameters which are not involved in the model (i.e., in a "slm" there are no \(\rho\) parameters but appear \(\lambda\) parameters; in a "sdem" model there are \(\rho\) and \(\theta\) parameters but no \(\lambda\) coefficients). Of course, if the user needs a model with spatial structure, a (nxN) weighting matrix, W, should be chosen.
The next step builds the equations of the SUR model. In this case, the user must specify
the number of regressors that intervene in each equation and the coefficients, \(\beta\) parameters,
associated with each regressor. The first question is solved through the argument p
which, if a scalar, indicates that the same number of regressors should appear in all the equations
of the model; if the user seeks for a model with different number of regressors in the
G equations, the argument p must be a (1xG) row vector with the required
information. It must be remembered that dgp_spsur
assumes that an
intercept appears in all equations of the model.
The second part of the problem posited above is solved through the argument Betas, which is a row vector of order (1xp) with the information requiered for this set of coeficcients.
The user must specify, also, the values of the spatial parameters corresponding to the chosen specification; we are refering to the \(\lambda_{g}\), \(\rho_{g}\) and \(\theta_{g}\), for \(g=1, ..., G and k=1,..., K_{g}\) parameters. This is done throught the arguments lambda, rho and theta. The firs two, lambda and rho, work as K: if they are scalar, the same value will be used in the G equations of the SUR model; if they are (1xG) row vectors, a different value will be assigned for each equation.
Moreover, theta works like the argument beta. The user must define a row vector of order \(1xPTheta\) showing these values. It is worth to remember that in no case the intercept will appear among the lagged regressors.
Finally, the user must decide which values of the regressors and of the error terms are to be used
in the simulation. The regressors can be uploaded from an external matrix generated previously by the
user. This is the argument X. It is the responsability of the user to check that the dimensions
of the external matrix are consistent with the dataset required for the SUR model. A second possibility
implies the regressors to be generated randomly by the function dgp_spsur
.
In this case, the user must select the probability distribution function from which the corresponding
data (of the regressors and the error terms) are to be drawn.
dgp_spsur
provides two multivariate distibution functions, namely, the Normal and the log-Normal for the
errors (the second should be taken as a clear departure from the standard assumption of
normality). In both cases, random matrices of order (TmNxG) are obtained from a multivariate
normal distribution, with a mean value of zero and the covariance matrix specified in the argument
Sigma; then, this matrix is exponentiated for the log-Normal case. Roughly, the same procedure
applies for drawing the values of the regressor. There are two distribution functions available, the
normal and the uniform in the interval \(U[0,1]\); the regressors are always independent.
# NOT RUN {
####################################
######## CROSS SECTION DATA ########
####################################
####################################
#### Example 1: DGP SLM model
####################################
rm(list = ls()) # Clean memory
Tm <- 1 # Number of time periods
G <- 3 # Number of equations
N <- 50 # Number of spatial elements
p <- 3 # Number of independent variables
Sigma <- matrix(0.3, ncol = G, nrow = G)
diag(Sigma) <- 1
Betas <- c(1,2,3,1,-1,0.5,1,-0.5,2)
lambda <- 0.5 # level of spatial dependence
rho <- 0.0 # spatial autocorrelation error term = 0
# random coordinates
co <- cbind(runif(N,0,1),runif(N,0,1))
W <- spdep::nb2mat(spdep::knn2nb(spdep::knearneigh(co, k = 5,
longlat = FALSE)))
DGP <- dgp_spsur(Sigma = Sigma, Betas = Betas,
rho = rho, lambda = lambda, Tm = Tm,
G = G, N = N, p = p, W = W)
SLM <- spsur3sls(W = W, X = DGP$X, Y = DGP$Y, Tm = Tm, N = N, G = G,
p = c(3,3,3), type = "slm")
summary(SLM)
# }
Run the code above in your browser using DataLab