data_gen_p generates simulated panel data for estimating various panel stochastic frontier models, including the Generalized True Random Effects (GTRE), True Random Effects (TRE), Pooled Cross-Section (PCS), and True Fixed Effects (TFE) models. The function returns the data as a pdata.frame. All variants are produced so that the user can select those that they want.
data_gen_p(t, N, rand, sig_u, sig_v, sig_r, sig_h, cons, tau = 0.5, mu = 0, beta1, beta2)A pdata.frame object containing \(N \times t\) observations suitable for Stochastic Frontier Analysis (SFA).
The number of time periods.
The number of individuals.
A seed for the random number generator to ensure reproducibility.
The standard deviation (\(\sigma_u\)) for the one-sided error component (\(u_{it}\)).
The standard deviation (\(\sigma_v\)) for the two-sided error component (\(v_{it}\)).
The standard deviation (\(\sigma_r\)) for the two-sided individual effect (\(r_i\)).
The standard deviation (\(\sigma_h\)) for the one-sided individual effect (\(h_i\)).
The constant term (\(\beta_0\)) for the frontier models.
The dependence parameter (\(\tau\)) used for the y_tfe (TFE) model formulation, default is 0.5. See Chen, Schmidt, and Wang (2014, Journal of Econometrics).
The mean parameter (\(\mu\)) used for the Truncated-Normal (TN) component of the y_fd model with default set to 0. See Wang and Ho (2010, Journal of Econometrics).
The coefficient for the x1 variable (\(\beta_1\)).
The coefficient for the x2 variable (\(\beta_2\)).
David Bernstein
A pdata.frame object with \(N \times t\) observations, containing the following columns:
name Individual identifier.
year Time period identifier.
cons The constant term used in the data generation.
x1, x2 Explanatory variables generated from a log-uniform distribution.
x1_w, x2_w Explanatory variables with dependence parameter \(\tau\) and linkage with \(r_i\), used for the TFE model.
u, v, r, h The generated error and individual effect components.
y_gtre, y_tre, y_pcs, y_tfe Output variables for the Production Frontier models, including the constant.
y_gtre_nc, y_tre_nc, y_pcs_nc Output variables for the Production Frontier models, excluding the constant.
c_gtre, c_tre, c_pcs, c_tfe Output variables for the Cost Frontier models, including the constant.
c_gtre_nc, c_tre_nc, c_pcs_nc Output variables for the Cost Frontier models, excluding the constant.
y_fd Output variable for the first difference model (see Wang and Ho, 2010).
x_fd Explanatory variable for the y_fd model.
u_fd_star, z_fd, r_fd, u_fd Components used to generate y_fd.
u_gtre, z_gtre, y_gtre_z, y_tre_z Variables for models with heteroskedastic inefficiency (\(\sigma_{u,i} = \exp(0.9 + 0.6 Z_{i}))\).
The data is generated based on standard Stochastic Frontier Analysis (SFA) formulations, primarily for a **Production Frontier** where the one-sided error component \(u_{it}\) is subtracted:
y_gtre: GTRE model: \(y_{it} = \beta_0 + \beta_1 x_{1,it} + \beta_2 x_{2,it} + r_i - h_i + v_{it} - u_{it}\)
y_tre: TRE model: \(y_{it} = \beta_0 + \beta_1 x_{1,it} + \beta_2 x_{2,it} + r_i + v_{it} - u_{it}\)
y_pcs: PCS model: \(y_{it} = \beta_0 + \beta_1 x_{1,it} + \beta_2 x_{2,it} + v_{it} - u_{it}\)
y_tfe: TFE model: \(y_{it} = \beta_1 x_{1,it}^w + \beta_2 x_{2,it}^w + r_i + v_{it} - u_{it}\)
y_gtre_z: GTRE with Heteroskedastic \(u_{it}\): \(\sigma_{u,i} = \exp(0.9 + 0.6 Z_i)\).
For **Cost Frontier** models, the one-sided error component \(u_{it}\) is added (e.g., c_gtre).
The error terms are generated as:
\(r_i \sim N(0, \sigma_r^2)\) (individual two-sided effect)
\(h_i \sim |N(0, \sigma_h^2)|\) (individual one-sided effect)
\(v_{it} \sim N(0, \sigma_v^2)\) (two-sided noise)
\(u_{it} \sim |N(0, \sigma_u^2)|\) (one-sided inefficiency)
The First-Difference estimation model (y_fd) uses a variation where \(r_{i,fd} \sim U(0,1)\) and \(u_{it,fd}\) is generated using a heteroskedastic truncated-normal structure, reflecting an alternative model type.
Chen, Y., Schmidt, P., & Wang, H. (2014). Consistent estimation of the fixed effects stochastic frontier model. Journal of Econometrics, 181(2), 65-76.
Filippini, M., & Greene, W. H. (2016). Persistent and transient productive inefficiency: a maximum simulated likelihood approach. Journal of Productivity Analysis, 45, 187-196.
Wang, H., & Ho, C. M. (2010). Estimating fixed-effect panel stochastic frontier models by model transformation. Journal of Econometrics, 157(2), 286-296.
data_gen_p, to see all the data generating processes
library(sfa)
# Generate a dataset
data_trial <- data_gen_p(t=10, N=100, rand = 100,
sig_u = 1, sig_v = 0.3,
sig_r = .2, sig_h = .4,
cons = 0.5, tau = 0.5,
mu= 0.5, beta1 = 0.5,
beta2 = 0.5)
# See the first few rows
head(data_trial)
Run the code above in your browser using DataLab