misclass: Probit Model with Misclassification of the Dependent Variable

Description

Implements the Hausman, Arbrevaya, and Scott-Morton (1998) maximum likelihood estimator for probit models with potential misclassification of the dependent variable.

Usage

misclass(form,a0=0,a1=0,bmat=0,print.summary=TRUE,data=NULL)

Arguments

form

Model formula

Starting value for $\alpha_0$. Default: a0 = 0.

Starting value for $\alpha_1$. Default: a1 = 0.

bmat

Starting values for $\beta$. Default: bmat = 0, uses standard probit values.

print.summary

If print.summary=T, prints a summary of the final nlm estimates. Default: print.summary=T.

data

A data frame containing the data. Default: use data in the current working directory

Value

a0Estimate of $\alpha_0$, the probability that a true value of 0 is misclassified as a 1.
a1Estimate of $\alpha_1$, the probability that a true value of 1 is misclassified as a 0.
estimateCoefficient estimates.
stderrStandard errors for estimate
vmatFull covariance matrix.
iterationsThe number of iterations taken to convergence.
minimumThe value of the log-likelihood function.
gradientThe gradient vector.

Details

Let y be the observed value of the 0-1 dependent variable and let $\tilde{y}$ be the true value. The probability that a 0 is incorrectly classified as a 1 is $\alpha_0 = Pr(y=1|\tilde{y} = 0)$ and the probability that a 1 is incorrectly classified as a 0 is $\alpha_1 = Pr(y=0|\tilde{y} = 1)$ . Under the assumption that the errors in the underlying latent variable $X \beta + u$ are normally distributed, the probabilities of observing the correctly classified values of the dependent variable are $Pr(\tilde{y} = 1|X) = \Phi(X \beta)$ and $Pr(\tilde{y} = 0|X) = 1 - \Phi(X \beta)$. The probability that an observation is classified as a 1 is $Pr(y = 1|X) = (1-\alpha_1)\Phi(X \beta) + \alpha_0 (1-\Phi(X \beta)) =\alpha_0 + (1-\alpha_0-\alpha_1)\Phi(X \beta)$. The probability that an observation is classified as a 0 is $Pr(y = 0|X) = \alpha_1\Phi(X \beta) + (1-\alpha_0)(1-\Phi(X \beta)) = 1-\alpha_0-(1-\alpha_0-\alpha_1)\Phi(X \beta)$. The log-likelihood function for the probit model with misclassification is given by $$lnL = \sum_i {y_i ln(Pr(y_i=1|X_i)) + (1-y_i)ln(Pr(y_i=0)|X_i)) } =$$ $$\sum_i {y_i ln(\alpha_0+(1-\alpha_0-\alpha_1)\Phi(X \beta)) + (1-y_i)ln(1-\alpha_0-(1-\alpha_0-\alpha_1)\Phi(X \beta)) }.$$ The log-likelihood function is maximized using the nlm function. In practice, the model sometimes has difficulties converging because the maximization procedure attempts to set the misclassification probabilities outside the (0,1) interval. To avert this problem, the misclass function estimates $\alpha_0 = \Phi(\alpha_0^*)$ and $\alpha_1 = \Phi(\alpha_1^*)$. The covariance matrix estimate is calculated using the hessian option in nlm. The vector estimate contains the estimated values of $\beta$, $\alpha_0^* = \Phi^{-1}(\alpha_0) = qnorm(\alpha_0)$, and $\alpha_1^* = \Phi^{-1}(\alpha_1) = qnorm(\alpha_1)$. Similarly, stderr and vmat report the standard error estimates and the full covariance matrix for $(\beta \,\,\, \alpha_0^* \,\,\, \alpha_1^*)$. The estimated probabilities are reported in a0 and a1. By default, the starting values are obtained using a standard probit model with $\alpha_0$ = 0 and $\alpha_1$ = 0. The standard probit models are presented also. The starting values can be changed using the a0, a1, and bmat options in misclass.

References

Dye, Richard F. and Daniel P. McMillen, "Teardowns and Land Values in the Chicago Metropolitan Area," Journal of Urban Economics 61 (2007), 45-64. Hausman, J.A., Jason Arbrevaya, and F.M. Scott-Morton, "Misclassification of the Dependent Variable in a Discrete-Response Setting," Journal of Econometrics 87 (1998), 239-269.

Examples

Run this code

set.seed(189)
n = 1000
x <- rnorm(n)
x <- sort(x)
y <- x*1 + rnorm(n, 0, sd(x)/2)
y <- ifelse(y>0,1,0)
e <- runif(n)
misy <- y
misy <- ifelse(e<.10&y==0,1,y)
misy <- ifelse(e>.90&y==1,0,misy)
table(y,misy)
fit <- misclass(misy~x)

Run the code above in your browser using DataLab