Learn R Programming

PanelSelect (version 1.0.0)

probitRE_linearRE: Panel Sample Selection Model for Continuous Outcome

Description

A panel sample selection model for continuous outcome, with selection at both the individual and individual-time levels. The outcome is observed in the second stage only if the first stage outcome is one.

Let \(\boldsymbol{w}_{it}\) and \(\boldsymbol{x}_{it}\) represent the row vectors of covariates in the selection and outcome equations, respectively, with \(\boldsymbol{\alpha}\) and \(\boldsymbol{\beta}\) denoting the corresponding column vectors of parameters.

First stage (probitRE): $$d_{it}=1(\mathbf{w}_{it} \boldsymbol{\alpha}+\delta u_i+\varepsilon_{it}>0)$$ Second stage (linearRE): $$y_{it} = \mathbf{x}_{it} \boldsymbol{\beta} + \lambda v_i +\sigma \epsilon_{it}$$ Correlation structure: \(u_i\) and \(v_i\) are bivariate normally distributed with a correlation of \(\rho\). \(\varepsilon_{it}\) and \(\epsilon_{it}\) are bivariate normally distributed with a correlation of \(\tau\).

w and x can be the same set of variables. Identification can be weak if w are not good predictors of d.

Usage

probitRE_linearRE(
  form_probit,
  form_linear,
  id.name,
  data = NULL,
  par = NULL,
  method = "BFGS",
  rho_off = FALSE,
  tau_off = FALSE,
  H = 10,
  init = c("zero", "unif", "norm", "default")[4],
  rho.init = 0,
  tau.init = 0,
  use.optim = FALSE,
  verbose = 0
)

Value

A list containing the results of the estimated model, some of which are inherited from the return of maxLik

  • estimates: Model estimates with 95% confidence intervals

  • estimate or par: Point estimates

  • predict. A list containing the predicted probabilities of responding (respond_prob) and the predicted counterfactual outcome values (outcome), their gradients (gr_respond and gr_outcome), and estimated counterfactual population mean (pop_mean).

  • variance_type: covariance matrix used to calculate standard errors. Either BHHH or Hessian.

  • var: covariance matrix

  • se: standard errors

  • var_bhhh: BHHH covariance matrix, inverse of the outer product of gradient at the maximum

  • se_bhhh: BHHH standard errors

  • gradient: Gradient function at maximum

  • hessian: Hessian matrix at maximum

  • gtHg: \(g'H^-1g\), where H^-1 is simply the covariance matrix. A value close to zero (e.g., <1e-3 or 1e-6) indicates good convergence.

  • LL or maximum: Likelihood

  • AIC: AIC

  • BIC: BIC

  • n_obs: Number of observations

  • n_par: Number of parameters

  • time: Time takes to estimate the model

  • iterations: number of iterations taken to converge

  • message: Message regarding convergence status.

Note that the list inherits all the components in the output of maxLik. See the documentation of maxLik for more details.

Arguments

form_probit

Formula for the panel probit model with random effects at the individual level

form_linear

Formula for the panel linear model with random effects at the individual level

id.name

the name of the id column in data

data

Input data, must be a data.frame object

par

Starting values for estimates

method

Optimization algorithm. Default is BFGS

rho_off

A Boolean value indicating whether to turn off the correlation between the random effects of the probit and linear models. Default is FALSE.

tau_off

A Boolean value indicating whether to turn off the correlation between the error terms of the probit and linear models. Default is FALSE.

H

Number of quadrature points

init

Initialization method

rho.init

Initial value for the correlation between the random effects of the probit and linear models. Default is 0.

tau.init

Initial value for the correlation between the error terms of the probit and linear models. Default is 0.

use.optim

A Boolean value indicating whether to use optim instead of maxLik. Default is FALSE.

verbose

A integer indicating how much output to display during the estimation process.

  • <0 - No ouput

  • 0 - Basic output (model estimates)

  • 1 - Limited output, providing likelihood of iterations

  • 2 - Moderate output, basic ouput + parameter and likelihood on each call

  • 3 - Extensive output, moderate output + gradient values on each call

References

Bailey, M., & Peng, J. (2025). A Random Effects Model of Non-Ignorable Nonresponse in Panel Survey Data. Available at SSRN https://www.ssrn.com/abstract=5475626

See Also

Other PanelSelect: probitRE_PLNRE(), probitRE_PoissonRE(), probitRE_probitRE()

Examples

Run this code
library(PanelSelect)
library(MASS)
N = 200
period = 5
obs = N*period
rho = 0.5
tau = 0
set.seed(100)

re = mvrnorm(N, mu=c(0,0), Sigma=matrix(c(1,rho,rho,1), nrow=2))
u = rep(re[,1], each=period)
v = rep(re[,2], each=period)
e = mvrnorm(obs, mu=c(0,0), Sigma=matrix(c(1,tau,tau,1), nrow=2))
e1 = e[,1]
e2 = e[,2]

t = rep(1:period, N)
id = rep(1:N, each=period)
w = rnorm(obs)
z = rnorm(obs)
x = rnorm(obs)
d = as.numeric(x + w + u + e1 > 0)
y = x + w + v + e2
y[d==0] = NA
dt = data.frame(id, t, y, x, w, z, d)

# As N increases, the parameter estimates will be more accurate
m = probitRE_linearRE(d~x+w, y~x+w, 'id', dt, H=10, verbose=-1)
print(m$estimates, digits=4)

Run the code above in your browser using DataLab