prodestLP: Estimate productivity - Levinsohn-Petrin method

Description

The prodestLP() The prodestWRDG() function accepts at least 6 objects (id, time, output, free, state and proxy variables), and returns a prod object of class S3 with three elements: (i) a list of model-related objects, (ii) a list with the data used in the estimation and estimated vectors of first-stage residuals, and (iii) a list with the estimated parameters and their bootstrapped standard errors.

Usage

prodestLP(Y, fX, sX, pX, idvar, timevar, R = 20, cX = NULL,
            opt = 'optim', theta0 = NULL, cluster = NULL, tol = 1e-100, exit = FALSE)

Arguments

the vector of value added log output.

the vector/matrix/dataframe of log free variables.

the vector/matrix/dataframe of log state variables.

the vector/matrix/dataframe of log proxy variables.

the vector/matrix/dataframe of control variables. By default cX= NULL.

idvar

the vector/matrix/dataframe identifying individual panels.

timevar

the vector/matrix/dataframe identifying time.

the number of block bootstrap repetitions to be performed in the standard error estimation. By default R = 20.

opt

a string with the optimization algorithm to be used during the estimation. By default opt = 'optim'.

theta0

a vector with the second stage optimization starting points. By default theta0 = NULL and the optimization is run starting from the first stage estimated parameters + \(N(\mu=0,\sigma=0.01)\) noise.

cluster

an object of class "SOCKcluster" or "cluster". By default cluster = NULL.

tol

optimizer tolerance. By default tol = 1e-100.

exit

Indicator for attrition in the data - i.e., if firms exit the market. By default exit = FALSE; if exit = TRUE, an indicator function for firms whose last appearance is before the last observation's date is generated and used in the second stage. The user can even specify an indicator variable/matrix/dataframe with the exit years.

Value

The output of the function prodestLP is a member of the S3 class prod. More precisely, is a list (of length 3) containing the following elements:

Model, a list containing:

method: a string describing the method ('LP').
boot.repetitions: the number of bootstrap repetitions used for standard errors' computation.
elapsed.time: time elapsed during the estimation.
theta0: numeric object with the optimization starting points - second stage.
opt: string with the optimization routine used - 'optim', 'solnp' or 'DEoptim'.
opt.outcome: optimization outcome.
FSbetas: first stage estimated parameters.

Data, a list containing:

Y: the vector of value added log output.
free: the vector/matrix/dataframe of log free variables.
state: the vector/matrix/dataframe of log state variables.
proxy: the vector/matrix/dataframe of log proxy variables.
control: the vector/matrix/dataframe of log control variables.
idvar: the vector/matrix/dataframe identifying individual panels.
timevar: the vector/matrix/dataframe identifying time.
FSresiduals: numeric object with the residuals of the first stage.

Estimates, a list containing:

pars: the vector of estimated coefficients.
std.errors: the vector of bootstrapped standard errors.

Members of class prod have an omega method returning a numeric object with the estimated productivity - that is: \(\omega_{it} = y_{it} - (\alpha + w_{it}\beta + k_{it}\gamma)\). FSres method returns a numeric object with the residuals of the first stage regression, while summary, show and coef methods are implemented and work as usual.

Details

Consider a Cobb-Douglas production technology for firm \(i\) at time \(t\)

\(y_{it} = \alpha + w_{it}\beta + k_{it}\gamma + \omega_{it} + \epsilon_{it}\)

where \(y_{it}\) is the (log) output, w_it a 1xJ vector of (log) free variables, k_it is a 1xK vector of state variables and \(\epsilon_{it}\) is a normally distributed idiosyncratic error term. The unobserved technical efficiency parameter \(\omega_{it}\) evolves according to a first-order Markov process:

\(\omega_{it} = E(\omega_{it} | \omega_{it-1}) + u_{it} = g(\omega_{it-1}) + u_{it}\)

and \(u_{it}\) is a random shock component assumed to be uncorrelated with the technical efficiency, the state variables in \(k_{it}\) and the lagged free variables \(w_{it-1}\). The LP method relies on the following set of assumptions:

a) firms immediately adjust the level of inputs according to demand function \(m(\omega_{it}, k_{it})\) after the technical efficiency shock realizes;
b) \(m_{it}\) is strictly monotone in \(\omega_{it}\);
c) \(\omega_{it}\) is scalar unobservable in \(m_{it} = m(.)\) ;
d) the levels of \(k_{it}\) are decided at time \(t-1\); the level of the free variable, \(w_{it}\), is decided after the shock \(u_{it}\) realizes.

Assumptions a)-d) ensure the invertibility of \(m_{it}\) in \(\omega_{it}\) and lead to the partially identified model:

\(y_{it} = \alpha + w_{it}\beta + k_{it}\gamma + h(m_{it}, k_{it}) + \epsilon_{it} = \alpha + w_{it}\beta + \phi(m_{it}, k_{it}) + \epsilon_{it} \)

which is estimated by a non-parametric approach - First Stage. Exploiting the Markovian nature of the productivity process one can use assumption d) in order to set up the relevant moment conditions and estimate the production function parameters - Second stage. Exploiting the residual \(\nu_{it}\) of:

\(y_{it} - w_{it}\hat{\beta} = \alpha + k_{it}\gamma + g(\omega_{it-1}, \chi_{it}) + \nu_{it} \)

and \(g(.)\) is typically left unspecified and approximated by a \(n^{th}\) order polynomial and \(\chi_{it}\) is an indicator function for the attrition in the market.

References

Levinsohn, J. and Petrin, A. (2003). "Estimating production functions using inputs to control for unobservables." The Review of Economic Studies, 70(2), 317-341.

Examples

Run this code

# NOT RUN {
    require(prodest)

    ## Chilean data on production.
    ## Publicly available at http://www.ine.cl/canales/chile_estadistico/estadisticas_
    ## economicas/industria/series_estadisticas/series_estadisticas_enia.php

    data(chilean)

    # we fit a model with two free (skilled and unskilled), one state (capital)
    # and one proxy variable (electricity)

    set.seed(154673)
    LP.fit <- prodestLP(chilean$Y, fX = cbind(chilean$fX1, chilean$fX2), chilean$sX,
                        chilean$pX, chilean$idvar, chilean$timevar)
    LP.fit.solnp <- prodestLP(chilean$Y, fX = cbind(chilean$fX1, chilean$fX2), chilean$sX,
                        chilean$pX, chilean$idvar, chilean$timevar, opt = 'solnp')

    
# }
# NOT RUN {
      # run the same model in parallel
      require(parallel)
      nCores <- as.numeric(Sys.getenv("NUMBER_OF_PROCESSORS"))
      cl <- makeCluster(getOption("cl.cores", nCores - 1))
      set.seed(154673)
      LP.fit.par <- prodestLP(chilean$Y, fX = cbind(chilean$fX1, chilean$fX2),
                      chilean$sX, chilean$pX, chilean$idvar, chilean$timevar,
                    cluster = cl)
      stopCluster(cl)
    
# }
# NOT RUN {
    # show results
    summary(LP.fit)
    summary(LP.fit.solnp)

    # show results in .tex tabular format
     printProd(list(LP.fit, LP.fit.solnp))
  
# }

Run the code above in your browser using DataLab