sampselect: Sample selection model (endogenous probit).

Description

James Heckman's Classical Simultaneous Equation Model (also known as the Sample Selection Model). Used to account for endogenous sample selection. Jointly models outcome model with propensity of selection, in which some of the outcomes are unobserved. Can also handle clustered data.

Usage

## S3 method for class "sampselect"
sampselect(outcome, probit, init = NULL, id = NULL, se = "R")

Arguments

outcome

an object of class "formula" with a numeric vector on the left hand side, and predictors of interest on the right hand side. Values on the left hand side that correspond to unobserved outcomes should be set to numeric values (to zero, for example, although they can be set to any numeric values).

probit

an object of class "formula" with a binary (0/1) numeric vector on the left hand side (1 indicating unobserved outcome), and predictors of selection on the right hand side (right hand side permitted to contain variables on the right hand side of the outcome equation).

init

a vector of initial values. The ordering of subparameters is: alpha (probit model parameters), beta (outcome model parameters), sigmay (outcome error standard deviation), rho (error correlation). If NULL, an initial value will be chosen through OLS linear regression and probit-link GLM without regard to endogeneity.

a numeric vector indicating subject IDs if data are clustered. In the absence of clustered data, this can be left blank (defaults to NULL).

a string, either "M" for model-based standard errors (based on inverse observed Fisher information), or "R" for robust standard errors (based on methods of Huber and White). Defaults to "R". If id is provided for clustered data, the cluster-robust variance estimator (with working independence) will be used even if the user specifies type "M".

Value

sampselect prints a summary of the coefficient estimates, standard errors, Wald-based confidence intervals, and p-values for the outcome model and the selection use probit model. prints a summary of the coefficient estimates, standard errors, Wald-based confidence intervals, and p-values for the outcome model and the selection use probit model.

Details

The model is evaluated with numerical minimization of the negative log-likelihood (the BFGS is used). The probit model and error correlation parameters are weakly identified and hence the error variance is set at unity. The data must be complete (no missing values) and numeric, with the exception of factors, which may be used on the right hand side of equations.

References

Heckman JJ. Dummy endogenous variables in a simultaneous equation system. Econometrica 46(4), 931-959.

Maddala GS. Limited-dependent and qualitative variables in econometrics. Cambridgeshire: Cambridge University Press; 1983.

Examples

Run this code

#- Generate Data -#
require(mvtnorm)
set.seed(1)
N <- 2000
X1 <- rnorm(N, 0, 1);
X2 <- rnorm(N, 0, 1);
X3 <- rnorm(N, 0, 1);
errors <- rmvnorm(N, sigma = 50*matrix(c(1, 0.5, 0.5, 1), nrow = 2))
Y <- 50 + X1 + X2 + errors[,1]
Z <- rep(0, N)
Z[(-5 + X1 + X3 + errors[,2]) > 0] <- 1
Y[Z == 1] <- 0

#- Estimate Model -#
sampselect(Y ~ X1 + X2, probit = Z ~ X1 + X3)

#- Estimate Model with Model-Based Variance -#
sampselect(Y ~ X1 + X2, probit = Z ~ X1 + X3, se = "M")

Run the code above in your browser using DataLab