hpaBinary: Perform semi-nonparametric binary choice model estimation

Description

This function performs semi-nonparametric single index binary choice model estimation via hermite polynomial densities approximation.

Usage

hpaBinary(formula, data, K = 1L, z_mean_fixed = NA_real_,
  z_sd_fixed = NA_real_, z_constant_fixed = 0,
  is_z_coef_first_fixed = TRUE, is_x0_probit = TRUE,
  is_sequence = FALSE, x0 = numeric(0))

Arguments

formula

an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted.

data

data frame containing the variables in the model.

non-negative integer representing polynomial degree.

z_mean_fixed

numeric value for binary choice equation random error density mean parameter. Set it to NA (default) if this parameter should be estimated rather then fixed.

z_sd_fixed

numeric value for binary choice equation random error density sd parameter. Set it to NA (default) if this parameter should be estimated rather then fixed.

z_constant_fixed

numeric value for binary choice equation constant parameter. Set it to NA (default) if this parameter should be estimated rather then fixed.

is_z_coef_first_fixed

bool value indicating weather binary equation first independend variable coefficient should be fixed (TRUE) or estimated (FALSE).

is_x0_probit

logical; if TRUE (default) then initial points for optimization routine will be obtained by probit model estimated via glm function.

is_sequence

if TRUE then function calculates models with polynomial degrees from 0 to K each time using initial values obtained from the previous step. In this case function will return the list of models where i-th list element correspond to model calculated under K=(i-1).

numeric vector of optimization routine initial values. Note that x0=c(pol_coefficients[-1], mean, sd, coefficients).

Value

This function returns an object of class "hpaBinary". An object of class "hpaBinary" is a list containing the following components:

optim - optim function output.
x1 - numeric vector of distribution parameters estimates.
mean - mean (mu) parameter of density function estimate.
sd - sd (sigma) parameter of density function estimate.
pol_coefficients - polynomial coefficients estimates.
pol_degrees - the same as K input parameter.
coefficients - regression (single index) coefficients estimates.
cov_matrix - estimated parameters covariance matrix estimate.
results - numeric matrix representing estimation results.
log-likelihood - value of Log-Likelihood function.
AIC - AIC value.
errors_exp - random error expectation estimate.
errors_var - random error variance estimate.
dataframe - dataframe containing variables mentioned in formula without NA values.
model_Lists - lists containing information about fixed parameters and parameters indexes in x1.
n_obs - number of observations.
z_latent - latent variable (signle index) estimates.
z_prob - probabilities of positive outcome (i.e. 1) estimates.

Details

Semi-nonparametric approach has been implemented via densities hermite polynomial approximation

Densities hermite polynomial approximation approach has been proposed by A. Gallant and D. W. Nychka in 1987. The main idea is to approximate unknown distribution density with hermite polynomial of degree pol_degree. In this framework hermite polynomial represents adjusted (to insure integration to 1) product of squared polynomial and normal distribution densities. Parameters mean and sd determine means and standard deviations of normal distribution density functions which are parts of this polynomial. For more information please refer to the literature listed below.

Parameters mean, sd, given_ind, omit_ind should have the same length as pol_degrees parameter.

The first polynomial coefficient (zero powers) set to identity constant for identification reasons.

Note that if is_z_coef_first_fixed value is TRUE then the coefficient for the first independent variable in formula will be fixed to 1.

Parameter sd will be scale adjusted in order to provide better initial point for optimization routine. Please, extract sd adjusted value from this function's output list.

All variables mentioned in formula should be numeric vectors.

Standard errors and significance levels are derived under parametric assumptions i.e. it is assumed that distribution related to hermite polynomial density and real distribution are from the same family.

This function maximizes log-likelihood function via optim setting it's method argument to "BFGS".

References

A. Gallant and D. W. Nychka (1987) <doi:10.2307/1913241>

Examples

Run this code

# NOT RUN {
## Estimate survival probability on Titanic

library("titanic")

#Prepare data set converting  
#all variables to numeric vectors
h <- data.frame("male" = as.numeric(titanic_train$Sex == "male"))
	h$class_1 <- as.numeric(titanic_train$Pclass == 1)
	h$class_2 <- as.numeric(titanic_train$Pclass == 2)
	h$class_3 <- as.numeric(titanic_train$Pclass == 3)
	h$sibl <- titanic_train$SibSp
	h$survived <- titanic_train$Survived
	h$age <- titanic_train$Age
	h$parch <- titanic_train$Parch
	h$fare <- titanic_train$Fare
	
#Estimate model parameters
model_hpa_1 <- hpaBinary(survived ~class_1 + class_2 +
	male + age + sibl + parch + fare,
	K = 3, data = h)
#get summary
summary(model_hpa_1)

#Get predicted probabilities
pred_hpa_1 <- predict(model_hpa_1)

#Calculate number of correct predictions
hpa_1_correct_0 <- sum((pred_hpa_1 < 0.5) & (model_hpa_1$dataframe$survived == 0))
hpa_1_correct_1 <- sum((pred_hpa_1 >= 0.5) & (model_hpa_1$dataframe$survived == 1))
hpa_1_correct <- hpa_1_correct_1 + hpa_1_correct_0

#Plot random errors density approximation
plot(model_hpa_1)
# }
# NOT RUN {
##Estimate parameters on data simulated from student distribution

library("mvtnorm")
set.seed(123456)

#Simulate independent variables from normal distribution
n <- 1000
X <- rmvnorm(n=n, mean = c(0,0), 
sigma = matrix(c(1,0.5,0.5,1), ncol=2))

#Simulate random errors
epsilon <- rt(n, 5) * (3 / sqrt(5))

#Calculate latent and observable variables values
z_star <- 1 + X[, 1] + X[, 2] + epsilon
z <- as.numeric((z_star > 0))

#Store the results into dataframe
h <- as.data.frame(cbind(z,X))
names(h) <- c("z", "x1", "x2")

#Estimate model parameters
model <- hpaBinary(formula = z ~ x1 + x2, data=h, K = 3)
summary(model)

#Get predicted probabibilities of 1 values
predict(model)

#Plot density function approximation
plot(model)
# }

Run the code above in your browser using DataLab