spprobitml: Maximum Likelihood Estimation of a Spatial Probit Model

Description

Probit estimation for a model with an underlying latent variable of the form $Y^* = \rho WY^* + X \beta +u$

Usage

spprobitml(form,wmat,blockid=NULL,stdprobit=TRUE,data=NULL)

Arguments

form

Model formula

wmat

The spatial weight matrix.

blockid

A variable identifying groups used to specify a block diagonal structure for the W matrix, e.g., blockid=state or blockid=region. Imposes that all elements outside of the blocks equal zero and then re-standardizes W su

stdprobit

If TRUE, also prints standard probit model results. Default: stdprobit=TRUE.

data

A data frame containing the data. Default: use data in the current working directory

Value

coefCoefficient estimates.
loglThe log-likelihood value.
vmat1The covariance matrix for $\hat{\beta}$, conditional on $\hat{\rho}$.
vmat2The unconditional covariance matrix for $\hat{\theta} = (\hat{\beta}, \hat{\rho})$.

Details

Estimation is based on the reduced form of the spatial AR model, $Y^* = (I - \rho W)^{-1}(X \beta + u)$. The model structure typically implies heteroskedasticity: the variance of the reduced form error term, $(I - \rho W)^{-1}u$, is $\sigma^2 diag { (I - \rho W)^{-1}(I - \rho W')^{-1} }$. For probit estimation, $\sigma^2$ is normalized to one. Let $s_i^2$ denote the variance for observation i, and define $X^* = (I - \rho W)^{-1}X$. Then the probability that $Y_i^* > 0$ is $\Phi (X_i^* \beta / s_i)$, and the log-likelihood function is $\sum_i { y_i ln (\Phi_i ) + (1-y_i ) ln(1-\Phi_i) }$. The spprobitml commands estimates the model by maximizing this log-likelihood function with respect to $\beta$ and $\rho$. Variants of this approach -- maximizing the log-likelihood function implied by the reduced form of the model -- were proposed by Case (1992) and McMillen (1992). Case's estimation procedure relies on a simple form of the spatial weight matrix in which each observation within a district is affected equally by the other observations in the district. McMillen's (1992) approach is equivalent to the one used here, but he suggested using an EM algorithm to estimate the model. Neither author suggested a covariance matrix: Case (1992) appears to have relied on the standard probit estimate which applies when the model is estimated conditional on $\rho$, while McMillen (1992) proposed a bootstrap approach. A consistent covariance matrix can be calculated using the gradient terms: $$V(\hat{\theta})^{-1} = \left( \sum_i \partial lnL_i / \partial \hat{\theta} \right)\left( \sum_i \partial lnL_i / \partial \hat{\theta}' \right)$$ The gradient term for $\hat{\rho}$ is calculated using numeric derivatives. The covariance matrix, $V(\hat{\theta})$, is not fully efficient because the estimation procedure only indirectly takes into account the autocorrelation structure. An analogous approach is used to calculate standard errors conditional on $\hat{\rho}$. In the conditional case, only the gradient terms for $\hat{\beta}$ are used; they are evaluated using the estimated values of $\rho$. Estimation can be very slow because each iteration requires the inversion of an nxn matrix. To speed up the estimation process and to reduce memory requirements, it may be desirable to impose a block diagonal structure on W. For example, it may be reasonable to impose that each state or region has its own error structure, with no correlation of errors across regions. The blockid option specifies a block diagonal structure such as blockid=region. The option leads the program to re-calculate the W matrix, imposing the block diagonal structure and re-normalizing the matrix to again have each row sum to one. If there are G groups, estimation requires G sub-matrices to be inverted rather than one nxn matrix, which greatly reduces memory requirements and significantly reduces the time required in estimation.

References

Case, Anne C., "Neighborhood Influence and Technological Change," Regional Science and Urban Economics 22 (1992), 491-508. McMillen, Daniel P., "Probit With Spatial Autocorrelation," Journal of Regional Science 32 (1992), 335-348.

Examples

Run this code

set.seed(9947)
library(maptools)
cmap <- readShapePoly(system.file("maps/CookCensusTracts.shp",
  package="McSpatial"))
cmap <- cmap[cmap$CHICAGO==1&cmap$CAREA!="O'Hare",]
lmat <- coordinates(cmap)
dnorth <- geodistance(lmat[,1],lmat[,2], lotarget=-87.627800, 
	latarget=41.881998)$dnorth
cmap <- cmap[dnorth>1,]
wmat <- makew(cmap)$wmat
n = nrow(wmat)
rho = .4
x <- runif(n,0,10)
ystar <- as.numeric(solve(diag(n) - rho*wmat)%*%(x + rnorm(n,0,2)))
y <- ystar>quantile(ystar,.4)
fit <- spprobitml(y~x,  wmat=wmat)

Run the code above in your browser using DataLab