Learn R Programming

sfa (version 1.0.4)

data_gen_p: Generate Panel Data for Stochastic Frontier Analysis

Description

data_gen_p generates simulated panel data for estimating various panel stochastic frontier models, including the Generalized True Random Effects (GTRE), True Random Effects (TRE), Pooled Cross-Section (PCS), and True Fixed Effects (TFE) models. The function returns the data as a pdata.frame. All variants are produced so that the user can select those that they want.

Usage

data_gen_p(t, N, rand, sig_u, sig_v, sig_r, sig_h, cons, tau = 0.5, mu = 0, beta1, beta2)

Value

A pdata.frame object containing \(N \times t\) observations suitable for Stochastic Frontier Analysis (SFA).

Arguments

t

The number of time periods.

N

The number of individuals.

rand

A seed for the random number generator to ensure reproducibility.

sig_u

The standard deviation (\(\sigma_u\)) for the one-sided error component (\(u_{it}\)).

sig_v

The standard deviation (\(\sigma_v\)) for the two-sided error component (\(v_{it}\)).

sig_r

The standard deviation (\(\sigma_r\)) for the two-sided individual effect (\(r_i\)).

sig_h

The standard deviation (\(\sigma_h\)) for the one-sided individual effect (\(h_i\)).

cons

The constant term (\(\beta_0\)) for the frontier models.

tau

The dependence parameter (\(\tau\)) used for the y_tfe (TFE) model formulation, default is 0.5. See Chen, Schmidt, and Wang (2014, Journal of Econometrics).

mu

The mean parameter (\(\mu\)) used for the Truncated-Normal (TN) component of the y_fd model with default set to 0. See Wang and Ho (2010, Journal of Econometrics).

beta1

The coefficient for the x1 variable (\(\beta_1\)).

beta2

The coefficient for the x2 variable (\(\beta_2\)).

Author

David Bernstein

Details

A pdata.frame object with \(N \times t\) observations, containing the following columns:

  • name Individual identifier.

  • year Time period identifier.

  • cons The constant term used in the data generation.

  • x1, x2 Explanatory variables generated from a log-uniform distribution.

  • x1_w, x2_w Explanatory variables with dependence parameter \(\tau\) and linkage with \(r_i\), used for the TFE model.

  • u, v, r, h The generated error and individual effect components.

  • y_gtre, y_tre, y_pcs, y_tfe Output variables for the Production Frontier models, including the constant.

  • y_gtre_nc, y_tre_nc, y_pcs_nc Output variables for the Production Frontier models, excluding the constant.

  • c_gtre, c_tre, c_pcs, c_tfe Output variables for the Cost Frontier models, including the constant.

  • c_gtre_nc, c_tre_nc, c_pcs_nc Output variables for the Cost Frontier models, excluding the constant.

  • y_fd Output variable for the first difference model (see Wang and Ho, 2010).

  • x_fd Explanatory variable for the y_fd model.

  • u_fd_star, z_fd, r_fd, u_fd Components used to generate y_fd.

  • u_gtre, z_gtre, y_gtre_z, y_tre_z Variables for models with heteroskedastic inefficiency (\(\sigma_{u,i} = \exp(0.9 + 0.6 Z_{i}))\).

The data is generated based on standard Stochastic Frontier Analysis (SFA) formulations, primarily for a **Production Frontier** where the one-sided error component \(u_{it}\) is subtracted:

  • y_gtre: GTRE model: \(y_{it} = \beta_0 + \beta_1 x_{1,it} + \beta_2 x_{2,it} + r_i - h_i + v_{it} - u_{it}\)

  • y_tre: TRE model: \(y_{it} = \beta_0 + \beta_1 x_{1,it} + \beta_2 x_{2,it} + r_i + v_{it} - u_{it}\)

  • y_pcs: PCS model: \(y_{it} = \beta_0 + \beta_1 x_{1,it} + \beta_2 x_{2,it} + v_{it} - u_{it}\)

  • y_tfe: TFE model: \(y_{it} = \beta_1 x_{1,it}^w + \beta_2 x_{2,it}^w + r_i + v_{it} - u_{it}\)

  • y_gtre_z: GTRE with Heteroskedastic \(u_{it}\): \(\sigma_{u,i} = \exp(0.9 + 0.6 Z_i)\).

For **Cost Frontier** models, the one-sided error component \(u_{it}\) is added (e.g., c_gtre).

The error terms are generated as:

  • \(r_i \sim N(0, \sigma_r^2)\) (individual two-sided effect)

  • \(h_i \sim |N(0, \sigma_h^2)|\) (individual one-sided effect)

  • \(v_{it} \sim N(0, \sigma_v^2)\) (two-sided noise)

  • \(u_{it} \sim |N(0, \sigma_u^2)|\) (one-sided inefficiency)

The First-Difference estimation model (y_fd) uses a variation where \(r_{i,fd} \sim U(0,1)\) and \(u_{it,fd}\) is generated using a heteroskedastic truncated-normal structure, reflecting an alternative model type.

References

Chen, Y., Schmidt, P., & Wang, H. (2014). Consistent estimation of the fixed effects stochastic frontier model. Journal of Econometrics, 181(2), 65-76.

Filippini, M., & Greene, W. H. (2016). Persistent and transient productive inefficiency: a maximum simulated likelihood approach. Journal of Productivity Analysis, 45, 187-196.

Wang, H., & Ho, C. M. (2010). Estimating fixed-effect panel stochastic frontier models by model transformation. Journal of Econometrics, 157(2), 286-296.

See Also

data_gen_p, to see all the data generating processes

Examples

Run this code
library(sfa) 
# Generate a dataset 
data_trial <- data_gen_p(t=10, N=100, rand = 100, 
                       sig_u = 1,  sig_v = 0.3, 
                       sig_r = .2, sig_h = .4, 
                       cons = 0.5, tau = 0.5,
                       mu= 0.5, beta1 = 0.5,
                       beta2 = 0.5)
 # See the first few rows 
 head(data_trial)

Run the code above in your browser using DataLab