Learn R Programming

PAGFL (version 1.1.4)

sim_tv_DGP: Simulate a Time-varying Panel With a Group Structure in the Slope Coefficients

Description

Construct a time-varying panel data set subject to a group structure in the slope coefficients with optional \(AR(1)\) innovations.

Usage

sim_tv_DGP(
  N = 50,
  n_periods = 40,
  intercept = TRUE,
  p = 1,
  n_groups = 3,
  d = 3,
  dynamic = FALSE,
  group_proportions = NULL,
  error_spec = "iid",
  locations = NULL,
  scales = NULL,
  polynomial_coef = NULL,
  sd_error = 1
)

Value

A list holding

alpha

a \(T \times p \times K\) array of group-specific time-varying parameters

beta

a \(T \times p \times N\) array of individual time-varying parameters

groups

a vector indicating the group memberships \((g_1, \dots, g_N)\), where \(g_i = k\) if \(i \in\) group \(k\).

y

a \(NT \times 1\) vector of the dependent variable, with \(\bold{y}=( \bold{y}_1, \dots, \bold{y}_N)^\prime\), \(\bold{y}_i = (y_{i1}, \dots, y_{iT})^\prime\) and the scalar \(y_{it}\).

X

a \(NT \times p\) matrix of explanatory variables, with \(\bold{X}=(\bold{X}_1^\prime, \dots, \bold{X}_N^\prime)^\prime\), \(\bold{X}_i = (\bold{x}_{i1}, \dots, \bold{x}_{iT})^\prime\) and the \(p \times 1\) vector \(\bold{x}_{it}\).

data

a \(NT \times (p + 1)\) data.frame of the outcome and the explanatory variables.

Arguments

N

the number of cross-sectional units. Default is 50.

n_periods

the number of simulated time periods \(T\). Default is 40.

intercept

logical. If TRUE, a time-varying intercept is generated.

p

the number of simulated explanatory variables

n_groups

the number of groups \(K\). Default is 3.

d

the polynomial degree used to construct the time-varying coefficients.

dynamic

Logical. If TRUE, the panel includes one stationary autoregressive lag of \(y_{it}\) as a regressor. Default is FALSE.

group_proportions

a numeric vector of length n_groups indicating size of each group as a fraction of \(N\). If NULL, all groups are of size \(N / K\). Default is NULL.

error_spec

options include

"iid"

for \(iid\) errors.

"AR"

for an \(AR(1)\) error process with an autoregressive coefficient of 0.5.

Default is "iid".

locations

a \(p \times K\) matrix of location parameters of a logistic distribution function used to construct the time-varying coefficients. If left empty, the location parameters are drawn randomly. Default is NULL.

scales

a \(p \times K\) matrix of scale parameters of a logistic distribution function used to construct the time-varying coefficients. If left empty, the location parameters are drawn randomly. Default is NULL.

polynomial_coef

a \(p \times d \times K\) array of coefficients for the polynomials used to construct the time-varying coefficients. If left empty, the location parameters are drawn randomly. Default is NULL.

sd_error

standard deviation of the cross-sectional errors. Default is 1.

Author

Paul Haimerl

Details

The scalar dependent variable \(y_{it}\) is generated according to the time-varying panel data model $$y_{it} = \gamma_i + \bold{\beta}^\prime_{i} (t/T) \bold{x}_{it} + u_{it}, \quad i = 1, \dots, N, \; t = 1, \dots, T,$$ where \(\gamma_i\) is an individual fixed effect and \(\bold{x}_{it}\) is a \(p \times 1\) vector of explanatory variables. The \(p \times 1\) coefficient vector \(\bold{\beta}_i (t/T)\) follows the group pattern $$\bold{\beta}_i \left( \frac{t}{T} \right) = \sum_{k = 1}^K \bold{\alpha}_k \left( \frac{t}{T} \right) \bold{1} \{i \in G_k \},$$ with \(\cup_{k = 1}^K G_k = \{1, \dots, N\}\) and \(G_k \cap G_j = \emptyset\) for any \(k \neq j\), \(k,j = 1, \dots, K\). The total number of groups \(K\) is determined by n_groups.

The predictors are simulated as: $$x_{it,j} = 0.2 \gamma_i + e_{it,j}, \quad \gamma_i,e_{it,j} \sim i.i.d. N(0, 1), \quad j = \{1, \dots, p\},$$ where \(e_{it,j}\) denotes a series of innovations. \(\gamma_i\) and \(e_i\) are independent of each other.

The errors \(u_{it}\) feature a \(iid\) standard normal distribution.

In case locations = NULL, the location parameters are drawn from \(\sim Unif[0.3, 0.9]\). In case scales = NULL, the scale parameters are drawn from \(\sim Unif[0.01, 0.09]\). In case polynomial_coef = NULL, the polynomial coefficients are drawn from \(\sim Unif[-20, 20]\) and normalized so that all coefficients of one polynomial sum up to 1. The final coefficient function follows as \(\bold{\alpha}_k (t/T) = 3 * F(t/T, location, scale) + \sum_{j=1}^d a_j (t/T)^j\), where \(F(\cdot, location, scale)\) denotes a cumulative logistic distribution function and \(a_j\) reflects a polynomial coefficient.

Examples

Run this code
# Simulate a time-varying panel subject to a time trend and a group structure
set.seed(1)
sim <- sim_tv_DGP(N = 20, n_periods = 50, p = 1)
y <- sim$y
X <- sim$X

Run the code above in your browser using DataLab