hpaSelection: Perform semi-nonparametric selection model estimation

Description

This function performs semi-nonparametric selection model estimation via hermite polynomial densities approximation.

Usage

hpaSelection(selection, outcome, data, z_K = 1L, y_K = 1L,
  pol_elements = 3L, is_Newey = FALSE, x0 = numeric(0))

Arguments

selection

an object of class "formula" (or one that can be coerced to that class): a symbolic description of the selection equation form.

outcome

an object of class "formula" (or one that can be coerced to that class): a symbolic description of the outcome equation form.

data

data frame containing the variables in the model.

z_K

non-negative integer representing polynomial degree related to selection equation.

y_K

non-negative integer representing polynomial degree related to outcome equation.

pol_elements

number of conditional expectation approximating terms for Newey method.

is_Newey

logical; if TRUE then returns only Newey's method estimation results (default value is FALSE).

numeric vector of optimization routine initial values. Note that x0=c(pol_coefficients[-1], mean, sd, z_coef, y_coef).

Value

This function returns an object of class "hpaSelection". An object of class "hpaSelection" is a list containing the following components:

optim - optim function output.
x1 - numeric vector of distribution parameters estimates.
Newey - list containing information concerning Newey's method estimation results.
z_mean - estimate of the hermite polynomial mean parameter related to selection equation random error marginal distribution.
y_mean - estimate of the hermite polynomial mean parameter related to outcome equation random error marginal distribution.
z_sd - adjusted value of sd parameter related to selection equation random error marginal distribution.
y_sd - estimate of the hermite polynomial sd parameter related to outcome equation random error marginal distribution.
pol_coefficients - polynomial coefficients estimates.
pol_degrees - numeric vector which first element is z_K and the second is y_K.
z_coef - selection equation regression coefficients estimates.
y_coef - outcome equation regression coefficients estimates.
cov_matrix - estimated parameters covariance matrix estimate.
results - numeric matrix representing estimation results.
log-likelihood - value of Log-Likelihood function.
AIC - AIC value.
re_moments - list which contains information about random errors expectations, variances and correlation.
data_List - list containing model variables and their partiotion according to outcome and selection equations.
n_obs - number of observations.
ind_List - list which contains information about parameters indexes in x1.
selection_formula - the same as selection input parameter.
outcome_formula - the same as outcome input parameter.

Abovementioned list Newey has class "hpaNewey" and contains the following components:

y_coef - regression coefficients estimates (except constant term which is part of conditional expectation approximating polynomial).
z_coef - regression coefficients estimates related to selection equation.
constant_biased - biased estimate of constant term.
inv_mills - inverse mills rations estimates and their powers (including constant).
inv_mills_coef - coefficients related to inv_mills.
pol_elements - the same as pol_elements input parameter.
outcome_exp_cond - dependend variable conditional expectation estimates.
selection_exp - selection equation random error expectation estimate.
selection_var - selection equation random error variance estimate.
hpaBinaryModel - object of class "hpaBinary" which contains selection equation estimation results.

Abovementioned list re_moments contains the following components:

selection_exp - selection equation random errors expectation estimate.
selection_var - selection equation random errors variance estimate.
outcome_exp - outcome equation random errors expectation estimate.
outcome_var - outcome equation random errors variance estimate.
errors_covariance - outcome and selection equation random errors covariance estimate.
rho - outcome and selection equation random errors correlation estimate.

Details

Semi-nonparametric approach has been implemented via densities hermite polynomial approximation

Densities hermite polynomial approximation approach has been proposed by A. Gallant and D. W. Nychka in 1987. The main idea is to approximate unknown distribution density with hermite polynomial of degree pol_degree. In this framework hermite polynomial represents adjusted (to insure integration to 1) product of squared polynomial and normal distribution densities. Parameters mean and sd determine means and standard deviations of normal distribution density functions which are parts of this polynomial. For more information please refer to the literature listed below.

Parameters mean, sd, given_ind, omit_ind should have the same length as pol_degrees parameter.

The first polynomial coefficient (zero powers) set to identity constant for identification reasons.

Note that coefficient for the first independent variable in selection will be fixed to 1.

Parameter sd will be scale adjusted in order to provide better initial point for optimization routine. Please, extract sd adjusted value from this function's output list.

All variables mentioned in selection and outcome should be numeric vectors.

Standard errors and significance levels are derived under parametric assumptions i.e. it is assumed that distribution related to hermite polynomial density and real distribution are from the same family.

Initial values for optimization routine are obtained throught Newey method (see the reference below).

Note that selection equation dependent variables should have exactly two levels (0 and 1) where "0" states for the selection results which leads to unobservable values of dependent variable in outcome equation.

This function maximizes log-likelihood function via optim setting it's method argument to "BFGS".

References

A. Gallant and D. W. Nychka (1987) <doi:10.2307/1913241>

W. K. Newey (2009) <https://doi.org/10.1111/j.1368-423X.2008.00263.x>

Mroz T. A. (1987) <doi:10.2307/1911029>

Examples

Run this code

# NOT RUN {
##Let's estimate wage equation accounting for non-random selection.
##See the reference to Mroz TA (1987) to get additional details about
##the data this examples use

#Prepare data
library("sampleSelection")
data("Mroz87")
h = data.frame("kids" = as.numeric(Mroz87$kids5 + Mroz87$kids618 > 0),
	"age" = as.numeric(Mroz87$age),
	"faminc" = as.numeric(Mroz87$faminc),
	"educ" = as.numeric(Mroz87$educ),
	"exper" = as.numeric(Mroz87$exper),
	"city" = as.numeric(Mroz87$city),
	"wage" = as.numeric(Mroz87$wage),
	"lfp" = as.numeric(Mroz87$lfp))
	
#Estimate model parameters
model <- hpaSelection(selection = lfp ~ age + I(age^2) + I(log(faminc)) + educ + kids,
	outcome = log(wage) ~exper + I(exper ^ 2) + educ + city,
	z_K = 1, y_K = 2, data = h)
summary(model)

#Plot outcome equation random errorrs density
plot(model,is_outcome = TRUE)
#Plot selection equation random errorrs density
plot(model,is_outcome = FALSE)
# }
# NOT RUN {
##Estimate semi-nonparametric sample selection model
##parameters on simulated data given chi-squared random errors

set.seed(123)
library("mvtnorm")

#Sample size

n <- 500

#Simulate independent variables
X_rho <- 0.5
X_sigma <- matrix(c(1,X_rho,X_rho,X_rho,1,X_rho,X_rho,X_rho,1), ncol=3)
X <- rmvnorm(n=n, mean = c(0,0,0), 
             sigma = X_sigma)

#Simulate random errors
epsilon <- matrix(0, n, 2)
epsilon_z_y <- rchisq(n, 5)
epsilon[, 1] <- (rchisq(n, 5) + epsilon_z_y) * (sqrt(3/20)) - 3.8736
epsilon[, 2] <- (rchisq(n, 5) + epsilon_z_y) * (sqrt(3/20)) - 3.8736
#Simulate selection equation
z_star <- 1 + 1 * X[,1] + 1 * X[,2] + epsilon[,1]
z <- as.numeric((z_star > 0))

#Simulate outcome equation
y_star <- 1 + 1 * X[,1] + 1 * X[,3] + epsilon[,2]
z <- as.numeric((z_star > 0))
y <- y_star
y[z==0] <- NA
h <- as.data.frame(cbind(z, y, X))
names(h) <- c("z", "y", "x1", "x2", "x3")

#Estimate parameters
model <- hpaSelection(selection = z ~ x1 + x2, 
                      outcome = y ~ x1 + x3,
                      data = h, z_K = 1, y_K = 2)
summary(model)

#Get conditional predictions for outcome equation
model_pred_c <- predict(model,is_cond = TRUE)
#Conditional predictions y|z=1
model_pred_c$y_1
#Conditional predictions y|z=0
model_pred_c$y_0

#Get unconditional predictions for outcome equation
model_pred_u <- predict(model,is_cond = FALSE)
model_pred_u$y

#Get conditional predictions for selection equation
#Note that for z=0 these predictions are NA
predict(model, is_cond = TRUE, is_outcome = FALSE)
#Get unconditional predictions for selection equation
predict(model, is_cond = FALSE, is_outcome = FALSE)
# }

Run the code above in your browser using DataLab