Learn R Programming

endogeneity (version 2.1.4)

probit_linear_partial: Recursive Probit-Linear Model with Partially Observed First Stage

Description

Partially observed version of the Probit-Linear Model.

First stage (Probit, \(m_i\) is partially observed): $$m_i=1(\boldsymbol{\alpha}'\mathbf{w_i}+u_i>0)$$ Second stage (Linear): $$y_i = \boldsymbol{\beta}'\mathbf{x_i} + {\gamma}m_i + \sigma v_i$$ Endogeneity structure: \(u_i\) and \(v_i\) are bivariate normally distributed with a correlation of \(\rho\).

Unobserved \(m_i\) should be coded as NA. w and x can be the same set of variables. Identification can be weak if w are not good predictors of m. Observing \(m_i\) for a small proportion of observations (e.g., 10~20%) can significantly improve the identification of the model.

Usage

probit_linear_partial(
  form_probit,
  form_linear,
  data = NULL,
  EM = TRUE,
  par = NULL,
  method = "BFGS",
  verbose = 0,
  maxIter = 500,
  tol = 1e-06,
  tol_LL = 1e-08
)

Value

A list containing the results of the estimated model, some of which are inherited from the return of maxLik

  • estimates: Model estimates with 95% confidence intervals

  • estimate or par: Point estimates

  • variance_type: covariance matrix used to calculate standard errors. Either BHHH or Hessian.

  • var: covariance matrix

  • se: standard errors

  • gradient: Gradient function at maximum

  • hessian: Hessian matrix at maximum

  • gtHg: \(g'H^-1g\), where H^-1 is simply the covariance matrix. A value close to zero (e.g., <1e-3 or 1e-6) indicates good convergence.

  • LL or maximum: Likelihood

  • AIC: AIC

  • BIC: BIC

  • n_obs: Number of observations

  • n_par: Number of parameters

  • iterations: number of iterations taken to converge

  • message: Message regarding convergence status.

Note that the list inherits all the components in the output of maxLik. See the documentation of maxLik for more details.

Arguments

form_probit

Formula for the first-stage probit model, in which the dependent variable is partially observed

form_linear

Formula for the second stage linear model. The partially observed dependent variable of the first stage is automatically added as a regressor in this model (do not add manually)

data

Input data, a data frame

EM

Whether to maximize likelihood use the Expectation-Maximization (EM) algorithm, which is slower but more robust. Defaults to TRUE.

par

Starting values for estimates

method

Optimization algorithm. Default is BFGS

verbose

A integer indicating how much output to display during the estimation process.

  • <0 - No ouput

  • 0 - Basic output (model estimates)

  • 1 - Moderate output, basic ouput + parameter and likelihood in each iteration

  • 2 - Extensive output, moderate output + gradient values on each call

maxIter

max iterations for EM algorithm

tol

tolerance for convergence of EM algorithm

tol_LL

tolerance for convergence of likelihood

References

Peng, Jing. (2023) Identification of Causal Mechanisms from Randomized Experiments: A Framework for Endogenous Mediation Analysis. Information Systems Research, 34(1):67-84. Available at https://doi.org/10.1287/isre.2022.1113

See Also

Other endogeneity: bilinear(), biprobit(), biprobit_latent(), biprobit_partial(), linear_probit(), pln_linear(), pln_probit(), probit_linear(), probit_linearRE(), probit_linear_latent()

Examples

Run this code
# \donttest{
library(MASS)
N = 1000
rho = -0.5
set.seed(1)

x = rbinom(N, 1, 0.5)
z = rnorm(N)

e = mvrnorm(N, mu=c(0,0), Sigma=matrix(c(1,rho,rho,1), nrow=2))
e1 = e[,1]
e2 = e[,2]

m = as.numeric(1 + x + z + e1 > 0)
y = 1 + x + z + m + e2
est = probit_linear(m~x+z, y~x+z+m)
print(est$estimates, digits=3)

# partially observed version of m
observed_pct = 0.2
m_p = m
m_p[sample(N, N*(1-observed_pct))] = NA
est_latent = probit_linear_partial(m_p~x+z, y~x+z)
print(est_latent$estimates, digits=3)
# }

Run the code above in your browser using DataLab