sim_DGP: Simulate a Panel With a Group Structure in the Slope Coefficients

Description

Construct a static or dynamic, exogenous or endogenous panel data set subject to a group structure in the slope coefficients with optional $AR(1)$ or $GARCH(1,1)$ innovations.

Usage

sim_DGP(
  N = 50,
  n_periods = 40,
  p = 2,
  n_groups = 3,
  group_proportions = NULL,
  error_spec = "iid",
  dynamic = FALSE,
  dyn_panel = lifecycle::deprecated(),
  q = NULL,
  alpha_0 = NULL
)

Value

A list holding

alpha: the $K \times p$ matrix of group-specific slope parameters. If dynamic = TRUE, the first column holds the $AR$ coefficient.
groups: a vector indicating the group memberships $(g_1, \dots, g_N)$, where $g_i = k$ if $i \in$ group $k$.
y: a $NT \times 1$ vector of the dependent variable, with $\bold{y}=(y_1, \dots, y_N)^\prime$, $y_i = (y_{i1}, \dots, y_{iT})^\prime$ and the scalar $y_{it}$.
X: a $NT \times p$ matrix of explanatory variables, with $\bold{X}=(x_1, \dots, x_N)^\prime$, $x_i = (x_{i1}, \dots, x_{iT})^\prime$ and the $p \times 1$ vector $x_{it}$.
Z: a $NT \times q$ matrix of instruments , where $q \geq p$, $\bold{Z}=(z_1, \dots, z_N)^\prime$, $z_i = (z_{i1}, \dots, z_{iT})^\prime$ and $z_{it}$ is a $q \times 1$ vector. In case a panel with exogenous regressors is generated (q = NULL), $\bold{Z}$ equals NULL.
data: a $NT \times (p + 1)$ data.frame of the outcome and the explanatory variables.

Arguments

N

the number of cross-sectional units. Default is 50.

n_periods

the number of simulated time periods $T$. Default is 40.

p

the number of explanatory variables. Default is 2.

n_groups

the number of groups $K$. Default is 3.

group_proportions

a numeric vector of length n_groups indicating size of each group as a fraction of $N$. If NULL, all groups are of size $N / K$. Default is NULL.

error_spec

options include

"iid": for $iid$ errors.

"AR"

for an $AR(1)$ error process with an autoregressive coefficient of 0.5.

"GARCH"

for a $GARCH(1,1)$ error process with a 0.05 constant, a 0.05 ARCH and a 0.9 GARCH coefficient.

Default is "iid".

dynamic

Logical. If TRUE, the panel includes one stationary autoregressive lag of $y_{it}$ as an explanatory variable (see sec. Details for more information on the $AR$ coefficient). Default is FALSE.

dyn_panel

deprecated and replaced by dynamic.

the number of exogenous instruments when a panel with endogenous regressors is to be simulated. If panel data set with exogenous regressors is supposed to be generated, pass NULL. Default is NULL.

alpha_0

a $K \times p$ matrix of group-specific coefficients. If dynamic = TRUE, the first column represents the stationary $AR$ coefficient. If NULL, the coefficients are drawn randomly (see sec. Details). Default is NULL.

Author

Paul Haimerl

Details

The scalar dependent variable $y_{it}$ is generated according to the following grouped panel data model $$y_{it} = \gamma_i + \beta_i^\prime x_{it} + u_{it}, \quad i = \{1, \dots, N\}, \quad t = \{1, \dots, T\}.$$ $\gamma_i$ represents individual fixed effects and $x_{it}$ a $p \times 1$ vector of regressors. The individual slope coefficient vectors $\beta_i$ are subject to a group structure $$\beta_i = \sum_{k = 1}^K \alpha_k \bold{1} \{i \in G_k\},$$ with $\cup_{k = 1}^K G_k = \{1, \dots, N\}$, $G_k \cap G_j = \emptyset$ and $\| \alpha_k - \alpha_j \| \neq 0$ for any $k \neq j$, $k = 1, \dots, K$. The total number of groups $K$ is determined by n_groups.

If a panel data set with exogenous regressors is generated (set q = NULL), the explanatory variables are simulated according to $$x_{it,j} = 0.2 \gamma_i + e_{it,j}, \quad \gamma_i,e_{it,j} \sim i.i.d. N(0, 1), \quad j = \{1, \dots, p\},$$ where $e_{it,j}$ denotes a series of innovations. $\gamma_i$ and $e_i$ are independent of each other.

In case alpha_0 = NULL, the group-level slope parameters $\alpha_{k}$ are drawn from $\sim U[-2, 2]$.

If a dynamic panel is specified (dynamic = TRUE), the $AR$ coefficients $\beta^{\text{AR}}_i$ are drawn from a uniform distribution with support $(-1, 1)$ and $x_{it,j} = e_{it,j}$. Moreover, the individual fixed effects enter the dependent variable via $(1 - \beta^{\text{AR}}_i) \gamma_i$ to account for the autoregressive dependency. We refer to Mehrabani (2023, sec 6) for details.

When specifying an endogenous panel (set q to $q \geq p$), the $e_{it,j}$ correlate with the cross-sectional innovations $u_{it}$ by a magnitude of 0.5 to produce endogenous regressors ($\text{E}(u|X) \neq 0$). However, the endogenous regressors can be accounted for by exploiting the $q$ instruments in $\bold{Z}$, for which $\text{E}(u|Z) = 0$ holds. The instruments and the first stage coefficients are generated in the same fashion as $\bold{X}$ and $\bold{\alpha}$ when q = NULL.

The function nests, among other, the DGPs employed in the simulation study of Mehrabani (2023, sec. 6).

References

Mehrabani, A. (2023). Estimation and identification of latent group structures in panel data. Journal of Econometrics, 235(2), 1464-1482. tools:::Rd_expr_doi("10.1016/j.jeconom.2022.12.002").

Examples

Run this code

# Simulate DGP 1 from Mehrabani (2023, sec. 6)
set.seed(1)
alpha_0_DGP1 <- matrix(c(0.4, 1, 1.6, 1.6, 1, 0.4), ncol = 2)
DGP1 <- sim_DGP(
  N = 50, n_periods = 20, p = 2, n_groups = 3,
  group_proportions = c(.4, .3, .3), alpha_0 = alpha_0_DGP1
)

Run the code above in your browser using DataLab