doHB: Starts the model estimation process

Description

The user can initiate the model estimation by calling the doHB() function. The function will perform some initial diagnostic tests to look for common errors in specifying the model. Upon completion, the function will write a number of output files with the model parameters and convergence statistics to the user's working directory. The flexibility comes in allowing the user to specify the likelihood function directly instead of assuming predetermined model structures. Types of models that can be estimated with this code include the family of discrete choice models (Multinomial Logit, Mixed Logit, Nested Logit, Error Components Logit and Latent Class) as well ordered response models like ordered probit and ordered logit. In addition, the package allows for flexibility in specifying parameters as either fixed (non-varying across individuals) or random with continuous distributions. Parameter distributions supported include normal, positive/negative log-normal, positive/negative censored normal and the Johnson SB distribution. Kenneth Train's Matlab and Gauss code for doing hierarchical Bayesian estimation has served as the basis for a few of the functions included in this package. (See references below).

Usage

doHB(likelihood_user,choicedata,control)

Arguments

likelihood_user

A function that returns likelihood values for each observation in your dataset. This function takes the current values individual parameters and the set of fixed parameters (f) and computes the likelihood of observing the data given those values.

choicedata

A data.frame of choice data to be used in estimation.

control

A set of estimation controls in list form. See below for more details.

Details

There are a number of global variables that can be set to control the model estimation. Some need to specified directly in the model control file while others have default values that can be adjusted by the analyst if something other than the default is desired. User-specified controls gVarNamesNormal - A vector of charater-based names for the random parameters. (Defaults to NULL) gVarNamesFixed - A vector of character-based names for the fixed parameters. (Defaults to NULL) gDIST - A vector of integers (1-6) which indicate which type of distribution should be applied to the random parameters - 1 = Normal, 2 = Postive Log-Normal, 3 = Negative Log-Normal, 4 = Positive Censored Normal, 5 = Negative Censored Normal, 6 = Johnson SB. There should be an element for each name in gVarNamesNormal. (Defaults to NULL) FC - A vector of starting values for the fixed parameters. There should be an element for each name in gVarNamesFixed. (Defaults to NULL) svN - A vector of starting values for the means of the underlying normals for the random parameters. There should be an element for each name in gVarNamesNormal. (Defaults to NULL) gNCREP - Number of burn-in iterations to use prior to convergence. (Defaults to 100000) gNEREP - Number of iterations to keep for averaging after convergence has been reached. (Defaults to 100000) gNSKIP - Number of iterations in between retaining draws for averaging. (Defaults to 1) gINFOSKIP - Number of iterations in between printing/saving information about the iteration process. (Defaults to 250) modelname - The model name which is used for creating output files. (Defaults to paste("HBModel",round(runif(1)*10000000,0),sep="")) gSIGDIG - The number of significant digits for reporting purposes. (Defaults to 10) priorVariance - The amount of prior variance assumed. (Defaults to 2.0) pvMatrix - A custom prior covariance matrix can be used in estimation. If specified in the control list, the custom matrix will override the default prior covariance matrix used by RSGHB. The prior covariance matrix needs to be a matrix object and of the correct size - length(gVarNamesNormal) x length(gVarNamesNormal). degreesOfFreedom - Additional degrees of freedom for the prior covariance matrix (not including the number of parameters. (Defaults to 5) rho - The initial proportionality fraction for the jumping distribution for the Metropolis-Hastings algorithm for the random parameters. This fraction is adjusted by the program after each iteration to attain an acceptance rate of about 0.3 (Defaults to 0.1) rhoF - The proportionality fraction for the jumping distribution for the Metropolis-Hastings algorithm for the fixed parameters. (Defaults to 0.0001) targetAcceptanceNormal - The target acceptance rate in the Metropolis-Hastings algorithm for the random parameters. (Defaults to 0.3) targetAcceptanceFixed - The target acceptance rate in the Metropolis-Hastings algorithm for the fixed parameters. (Defaults to 0.3) gFULLCV - A number that indicates if a full or independent covariance structure should be used for the random parameters. A value of 1 indicated full and 0 for an independent structure. (Defaults to 1) gMINCOEF - A vector of minimums for the Johnson SB distributions. If Johnson SB is used, each random coefficent needs an element but only the elements that correspond to a JSB in gDIST are used. (Defaults to 0) gMAXCOEF - Like gMINCOEF but for the maximum of the Johnson SB distribution. (Defaults to 0) gStoreDraws - A boolean value to store the draws for the individual level parameters. (Defaults to F) gSeed - The random seed. (Defaults to 0) constraintsNorm - This is a list of monotonic constraints to be applied during estimation. The structure of the constraints is c(param1number - inequality - param2number). For constraints relative to 0, use 0 instead of the param2number. For the inequality, use 1 for < and 2 for >. Example constraintsNorm <- list(c(5,1,0),c(6,1,5),c(7,1,6),c(8,1,7)) would constrain the 5th parameter < 0, the 6th parameter < 5th parameter, the 7th parameter < the 6th parameter, etc. (Defaults to NULL) nodiagnostics - If set to TRUE, the diagnostic report will not be reported to the screen with a prompt to continue. This makes batch processing easier to implement. (Defaults to FALSE) fixedA - This allows the analyst to fix means of the underlying normal distribution of random variables to certain values as opposed to estimating them. This would be important for example in an error components logit model or an integrated choice and latent variable model. The format for this input is a vector of length equal to the number of random parameters. Use NA for variables that should be estimated, e.g., fixedA = c(NA, NA, NA, NA, NA, NA, NA, 0). In this case, the mean of the underlying normal for the 8th random variable would be fixed to 0. fixedD - This allows the analyst to fix the variance of the underlying normal distribution of the random variables to certain values as opposed to estimating them. This would be important for example in an integrated choice and latent variable model. The format for this input is a vector of length equal to the number of random parameters. Use NA for variables that should be estimated, e.g., fixedD = c(NA, NA, NA, NA, NA, NA, NA, 1). In this case, the variance of the underlying normal for the 8th random variable would be fixed to 1. Output files A number of output files will be generated. A file - The A file contain the sample-level means of the underlying normal at each iteration. B file, Bsd file - The B file contains the average across iterations of the individual level draws for the underlying normals for the random parameters. The Bsd file provides the standard deviations of those individual draws. C file, Csd file - The C file contains the average across iterations of the individual level draws for the random parameters including the appropriate transformations. The Csd file provides the standard deviations of those individual draws. These two files are equivalent to the conditional distributions from models estimated using Maximum Simulated Likelihood methods. D file - This file contains a row-based representation of the sample covariance for each iteration. F file - This file contains the set of fixed (non-random) parameters for each iteration after convergence. Log file - This contains some statistics that can be used to understand if model convergence has been reached. PV Matrix - This file contains the prior covariance matrix that was assumed during the estimation of the model.

References

Train, K. (2009) Discrete Choice Methods with Simulation. Cambridge University Press. Train, K. and Sonnier G. (2005) Mixed Logit with Bounded Distributions of Correlated Partworths, Applications of Simulation Methods in Environmental and Resource Economics. Edited by Anna Alberini and Riccardo Scarpa. http://elsa.berkeley.edu/~train/trainsonnier.pdf Train, K. Original Gauss and Matlab code: http://elsa.berkeley.edu/Software/abstracts/train1006mxlhb.html

Examples

Run this code

data(choicedata)

tt1 <- choicedata$tt1
tt2 <- choicedata$tt2
toll2 <- choicedata$toll2

choice1 <- (choicedata$Choice==1)
choice2 <- (choicedata$Choice==2)

control <- list(
     modelname="MNL_WTPSpace",
     gVarNamesNormal=c("WTP","Price"),
     gDIST=c(1,1),
     svN=c(0,0),
     gNCREP=10000,
     gNEREP=10000,
     gNSKIP=1,
     gINFOSKIP=250
)

likelihood <- function(fc,b)
{  
     
     # random parameters
     cc    <- 1
     wtp1  <- b[,cc];cc=cc+1
     price <- b[,cc];cc=cc+1
     
     # discrete choice model in WTP-space
     v1 <-                 price * wtp1 * tt1
     v2 <- price * toll2 + price * wtp1 * tt2
     
     p  <- (exp(v1)*choice1 + exp(v2)*choice2) / (exp(v1) + exp(v2))
     
     return(p)
}

# not run
# doHB(likelihood, choicedata, control)

Run the code above in your browser using DataLab