Learn R Programming

PAGFL (version 1.1.3)

sim_tv_DGP: Simulate a Time-varying Panel With a Group Structure in the Slope Coefficients

Description

Construct a time-varying panel data set subject to a group structure in the slope coefficients with optional \(AR(1)\) innovations.

Usage

sim_tv_DGP(
  N = 50,
  n_periods = 40,
  intercept = TRUE,
  p = 1,
  n_groups = 3,
  d = 3,
  dynamic = FALSE,
  group_proportions = NULL,
  error_spec = "iid",
  locations = NULL,
  scales = NULL,
  polynomial_coef = NULL,
  sd_error = 1
)

Value

A list holding

alpha

a \(T \times p \times K\) array of group-specific time-varying parameters

beta

a \(T \times p \times N\) array of individual time-varying parameters

groups

a vector indicating the group memberships \((g_1, \dots, g_N)\), where \(g_i = k\) if \(i \in\) group \(k\).

y

a \(NT \times 1\) vector of the dependent variable, with \(\bold{y}=(y_1, \dots, y_N)^\prime\), \(y_i = (y_{i1}, \dots, y_{iT})^\prime\) and the scalar \(y_{it}\).

X

a \(NT \times p\) matrix of explanatory variables, with \(\bold{X}=(x_1, \dots, x_N)^\prime\), \(x_i = (x_{i1}, \dots, x_{iT})^\prime\) and the \(p \times 1\) vector \(x_{it}\).

data

a \(NT \times (p + 1)\) data.frame of the outcome and the explanatory variables.

Arguments

N

the number of cross-sectional units. Default is 50.

n_periods

the number of simulated time periods \(T\). Default is 40.

intercept

logical. If TRUE, a time-varying intercept is generated.

p

the number of simulated explanatory variables

n_groups

the number of groups \(K\). Default is 3.

d

the polynomial degree used to construct the time-varying coefficients.

dynamic

Logical. If TRUE, the panel includes one stationary autoregressive lag of \(y_{it}\) as a regressor. Default is FALSE.

group_proportions

a numeric vector of length n_groups indicating size of each group as a fraction of \(N\). If NULL, all groups are of size \(N / K\). Default is NULL.

error_spec

options include

"iid"

for \(iid\) errors.

"AR"

for an \(AR(1)\) error process with an autoregressive coefficient of 0.5.

Default is "iid".

locations

a \(p \times K\) matrix of location parameters of a logistic distribution function used to construct the time-varying coefficients. If left empty, the location parameters are drawn randomly. Default is NULL.

scales

a \(p \times K\) matrix of scale parameters of a logistic distribution function used to construct the time-varying coefficients. If left empty, the location parameters are drawn randomly. Default is NULL.

polynomial_coef

a \(p \times d \times K\) array of coefficients for a the polynomials used to construct the time-varying coefficients. If left empty, the location parameters are drawn randomly. Default is NULL.

sd_error

standard deviation of the cross-sectional errors. Default is 1.

Author

Paul Haimerl

Details

The scalar dependent variable \(y_{it}\) is generated according to the following time-varying grouped panel data model $$y_{it} = \gamma_i + \beta^\prime_{it} x_{it} + u_{it}, \quad i = 1, \dots, N, \; t = 1, \dots, T,$$ where \(\gamma_i\) is an individual fixed effect and \(x_{it}\) is a \(p \times 1\) vector of explanatory variables. The coefficient vector \(\beta_i = \{\beta_{i1}^\prime, \dots, \beta_{iT}^\prime \}^\prime\) is subject to the group pattern $$\beta_i \left( \frac{t}{T} \right) = \sum_{k = 1}^K \alpha_k \left( \frac{t}{T} \right) \bold{1} \{i \in G_k \},$$ with \(\cup_{k = 1}^K G_k = \{1, \dots, N\}\), \(G_k \cap G_j = \emptyset\) and \(\sup_{v \in [0,1]} \left( \| \alpha_k(v) - \alpha_j(v) \| \right) \neq 0\) for any \(k \neq j\), \(k = 1, \dots, K\). The total number of groups \(K\) is determined by n_groups.

The predictors are simulated as: $$x_{it,j} = 0.2 \gamma_i + e_{it,j}, \quad \gamma_i,e_{it,j} \sim i.i.d. N(0, 1), \quad j = \{1, \dots, p\},$$ where \(e_{it,j}\) denotes a series of innovations. \(\gamma_i\) and \(e_i\) are independent of each other.

The errors \(u_{it}\) feature a \(iid\) standard normal distribution.

In case locations = NULL, the location parameters are drawn from \(\sim U[0.3, 0.9]\). In case scales = NULL, the scale parameters are drawn from \(\sim U[0.01, 0.09]\). In case polynomial_coef = NULL, the polynomial coefficients are drawn from \(\sim U[-20, 20]\) and normalized so that all coefficients of one polynomial sum up to 1. The final coefficient function follows as \(\alpha_k (t/T) = 3 * F(t/T, location, scale) + \sum_{j=1}^d a_j (t/T)^j\), where \(F(\cdot, location, scale)\) denotes a cumulative logistic distribution function and \(a_j\) reflects a polynomial coefficient.

Examples

Run this code
# Simulate a time-varying panel subject to a time trend and a group structure
set.seed(1)
sim <- sim_tv_DGP(N = 20, n_periods = 50, p = 1)
y <- sim$y
X <- sim$X

Run the code above in your browser using DataLab