sim_tv_DGP: Simulate a Time-varying Panel With a Group Structure in the Slope Coefficients

Description

Construct a time-varying panel data set subject to a group structure in the slope coefficients with optional $AR(1)$ innovations.

Usage

sim_tv_DGP(
  N = 50,
  n_periods = 40,
  intercept = TRUE,
  p = 1,
  n_groups = 3,
  d = 3,
  dynamic = FALSE,
  group_proportions = NULL,
  error_spec = "iid",
  locations = NULL,
  scales = NULL,
  polynomial_coef = NULL,
  sd_error = 1
)

Value

A list holding

alpha: a $T \times p \times K$ array of group-specific time-varying parameters
beta: a $T \times p \times N$ array of individual time-varying parameters
groups: a vector indicating the group memberships $(g_1, \dots, g_N)$, where $g_i = k$ if $i \in$ group $k$.
y: a $NT \times 1$ vector of the dependent variable, with $\bold{y}=( \bold{y}_1, \dots, \bold{y}_N)^\prime$, $\bold{y}_i = (y_{i1}, \dots, y_{iT})^\prime$ and the scalar $y_{it}$.
X: a $NT \times p$ matrix of explanatory variables, with $\bold{X}=(\bold{X}_1^\prime, \dots, \bold{X}_N^\prime)^\prime$, $\bold{X}_i = (\bold{x}_{i1}, \dots, \bold{x}_{iT})^\prime$ and the $p \times 1$ vector $\bold{x}_{it}$.
data: a $NT \times (p + 1)$ data.frame of the outcome and the explanatory variables.

Arguments

N

the number of cross-sectional units. Default is 50.

n_periods

the number of simulated time periods $T$. Default is 40.

intercept

logical. If TRUE, a time-varying intercept is generated.

p

the number of simulated explanatory variables

n_groups

the number of groups $K$. Default is 3.

d

the polynomial degree used to construct the time-varying coefficients.

dynamic

Logical. If TRUE, the panel includes one stationary autoregressive lag of $y_{it}$ as a regressor. Default is FALSE.

group_proportions

a numeric vector of length n_groups indicating size of each group as a fraction of $N$. If NULL, all groups are of size $N / K$. Default is NULL.

error_spec

options include

"iid": for $iid$ errors.

"AR"

for an $AR(1)$ error process with an autoregressive coefficient of 0.5.

Default is "iid".

locations

a $p \times K$ matrix of location parameters of a logistic distribution function used to construct the time-varying coefficients. If left empty, the location parameters are drawn randomly. Default is NULL.

scales

a $p \times K$ matrix of scale parameters of a logistic distribution function used to construct the time-varying coefficients. If left empty, the location parameters are drawn randomly. Default is NULL.

polynomial_coef

a $p \times d \times K$ array of coefficients for the polynomials used to construct the time-varying coefficients. If left empty, the location parameters are drawn randomly. Default is NULL.

sd_error

standard deviation of the cross-sectional errors. Default is 1.

Author

Paul Haimerl

Details

The scalar dependent variable $y_{it}$ is generated according to the time-varying panel data model $$y_{it} = \gamma_i + \bold{\beta}^\prime_{i} (t/T) \bold{x}_{it} + u_{it}, \quad i = 1, \dots, N, \; t = 1, \dots, T,$$ where $\gamma_i$ is an individual fixed effect and $\bold{x}_{it}$ is a $p \times 1$ vector of explanatory variables. The $p \times 1$ coefficient vector $\bold{\beta}_i (t/T)$ follows the group pattern $$\bold{\beta}_i \left( \frac{t}{T} \right) = \sum_{k = 1}^K \bold{\alpha}_k \left( \frac{t}{T} \right) \bold{1} \{i \in G_k \},$$ with $\cup_{k = 1}^K G_k = \{1, \dots, N\}$ and $G_k \cap G_j = \emptyset$ for any $k \neq j$, $k,j = 1, \dots, K$. The total number of groups $K$ is determined by n_groups.

The predictors are simulated as: $$x_{it,j} = 0.2 \gamma_i + e_{it,j}, \quad \gamma_i,e_{it,j} \sim i.i.d. N(0, 1), \quad j = \{1, \dots, p\},$$ where $e_{it,j}$ denotes a series of innovations. $\gamma_i$ and $e_i$ are independent of each other.

The errors $u_{it}$ feature a $iid$ standard normal distribution.

In case locations = NULL, the location parameters are drawn from $\sim Unif[0.3, 0.9]$. In case scales = NULL, the scale parameters are drawn from $\sim Unif[0.01, 0.09]$. In case polynomial_coef = NULL, the polynomial coefficients are drawn from $\sim Unif[-20, 20]$ and normalized so that all coefficients of one polynomial sum up to 1. The final coefficient function follows as $\bold{\alpha}_k (t/T) = 3 * F(t/T, location, scale) + \sum_{j=1}^d a_j (t/T)^j$, where $F(\cdot, location, scale)$ denotes a cumulative logistic distribution function and $a_j$ reflects a polynomial coefficient.

Examples

Run this code

# Simulate a time-varying panel subject to a time trend and a group structure
set.seed(1)
sim <- sim_tv_DGP(N = 20, n_periods = 50, p = 1)
y <- sim$y
X <- sim$X

Run the code above in your browser using DataLab