Construct a time-varying panel data set subject to a group structure in the slope coefficients with optional \(AR(1)\) innovations.
sim_tv_DGP(
N = 50,
n_periods = 40,
intercept = TRUE,
p = 1,
n_groups = 3,
d = 3,
dynamic = FALSE,
group_proportions = NULL,
error_spec = "iid",
locations = NULL,
scales = NULL,
polynomial_coef = NULL,
sd_error = 1
)A list holding
alphaa \(T \times p \times K\) array of group-specific time-varying parameters
betaa \(T \times p \times N\) array of individual time-varying parameters
groupsa vector indicating the group memberships \((g_1, \dots, g_N)\), where \(g_i = k\) if \(i \in\) group \(k\).
ya \(NT \times 1\) vector of the dependent variable, with \(\bold{y}=( \bold{y}_1, \dots, \bold{y}_N)^\prime\), \(\bold{y}_i = (y_{i1}, \dots, y_{iT})^\prime\) and the scalar \(y_{it}\).
Xa \(NT \times p\) matrix of explanatory variables, with \(\bold{X}=(\bold{X}_1^\prime, \dots, \bold{X}_N^\prime)^\prime\), \(\bold{X}_i = (\bold{x}_{i1}, \dots, \bold{x}_{iT})^\prime\) and the \(p \times 1\) vector \(\bold{x}_{it}\).
dataa \(NT \times (p + 1)\) data.frame of the outcome and the explanatory variables.
the number of cross-sectional units. Default is 50.
the number of simulated time periods \(T\). Default is 40.
logical. If TRUE, a time-varying intercept is generated.
the number of simulated explanatory variables
the number of groups \(K\). Default is 3.
the polynomial degree used to construct the time-varying coefficients.
Logical. If TRUE, the panel includes one stationary autoregressive lag of \(y_{it}\) as a regressor. Default is FALSE.
a numeric vector of length n_groups indicating size of each group as a fraction of \(N\). If NULL, all groups are of size \(N / K\). Default is NULL.
options include
"iid"for \(iid\) errors.
"AR"for an \(AR(1)\) error process with an autoregressive coefficient of 0.5.
Default is "iid".
a \(p \times K\) matrix of location parameters of a logistic distribution function used to construct the time-varying coefficients. If left empty, the location parameters are drawn randomly. Default is NULL.
a \(p \times K\) matrix of scale parameters of a logistic distribution function used to construct the time-varying coefficients. If left empty, the location parameters are drawn randomly. Default is NULL.
a \(p \times d \times K\) array of coefficients for the polynomials used to construct the time-varying coefficients. If left empty, the location parameters are drawn randomly. Default is NULL.
standard deviation of the cross-sectional errors. Default is 1.
Paul Haimerl
The scalar dependent variable \(y_{it}\) is generated according to the time-varying panel data model
$$y_{it} = \gamma_i + \bold{\beta}^\prime_{i} (t/T) \bold{x}_{it} + u_{it}, \quad i = 1, \dots, N, \; t = 1, \dots, T,$$
where \(\gamma_i\) is an individual fixed effect and \(\bold{x}_{it}\) is a \(p \times 1\) vector of explanatory variables.
The \(p \times 1\) coefficient vector \(\bold{\beta}_i (t/T)\) follows the group pattern
$$\bold{\beta}_i \left( \frac{t}{T} \right) = \sum_{k = 1}^K \bold{\alpha}_k \left( \frac{t}{T} \right) \bold{1} \{i \in G_k \},$$
with \(\cup_{k = 1}^K G_k = \{1, \dots, N\}\) and \(G_k \cap G_j = \emptyset\) for any \(k \neq j\), \(k,j = 1, \dots, K\). The total number of groups \(K\) is determined by n_groups.
The predictors are simulated as: $$x_{it,j} = 0.2 \gamma_i + e_{it,j}, \quad \gamma_i,e_{it,j} \sim i.i.d. N(0, 1), \quad j = \{1, \dots, p\},$$ where \(e_{it,j}\) denotes a series of innovations. \(\gamma_i\) and \(e_i\) are independent of each other.
The errors \(u_{it}\) feature a \(iid\) standard normal distribution.
In case locations = NULL, the location parameters are drawn from \(\sim Unif[0.3, 0.9]\).
In case scales = NULL, the scale parameters are drawn from \(\sim Unif[0.01, 0.09]\).
In case polynomial_coef = NULL, the polynomial coefficients are drawn from \(\sim Unif[-20, 20]\) and normalized so that all coefficients of one polynomial sum up to 1.
The final coefficient function follows as \(\bold{\alpha}_k (t/T) = 3 * F(t/T, location, scale) + \sum_{j=1}^d a_j (t/T)^j\), where \(F(\cdot, location, scale)\) denotes a cumulative logistic distribution function and \(a_j\) reflects a polynomial coefficient.
# Simulate a time-varying panel subject to a time trend and a group structure
set.seed(1)
sim <- sim_tv_DGP(N = 20, n_periods = 50, p = 1)
y <- sim$y
X <- sim$X
Run the code above in your browser using DataLab