Construct a time-varying panel data set subject to a group structure in the slope coefficients with optional \(AR(1)\) innovations.
sim_tv_DGP(
N = 50,
n_periods = 40,
intercept = TRUE,
p = 1,
n_groups = 3,
d = 3,
dynamic = FALSE,
group_proportions = NULL,
error_spec = "iid",
locations = NULL,
scales = NULL,
polynomial_coef = NULL,
sd_error = 1
)
A list holding
alpha
a \(T \times p \times K\) array of group-specific time-varying parameters
beta
a \(T \times p \times N\) array of individual time-varying parameters
groups
a vector indicating the group memberships \((g_1, \dots, g_N)\), where \(g_i = k\) if \(i \in\) group \(k\).
y
a \(NT \times 1\) vector of the dependent variable, with \(\bold{y}=(y_1, \dots, y_N)^\prime\), \(y_i = (y_{i1}, \dots, y_{iT})^\prime\) and the scalar \(y_{it}\).
X
a \(NT \times p\) matrix of explanatory variables, with \(\bold{X}=(x_1, \dots, x_N)^\prime\), \(x_i = (x_{i1}, \dots, x_{iT})^\prime\) and the \(p \times 1\) vector \(x_{it}\).
data
a \(NT \times (p + 1)\) data.frame of the outcome and the explanatory variables.
the number of cross-sectional units. Default is 50.
the number of simulated time periods \(T\). Default is 40.
logical. If TRUE
, a time-varying intercept is generated.
the number of simulated explanatory variables
the number of groups \(K\). Default is 3.
the polynomial degree used to construct the time-varying coefficients.
Logical. If TRUE
, the panel includes one stationary autoregressive lag of \(y_{it}\) as a regressor. Default is FALSE
.
a numeric vector of length n_groups
indicating size of each group as a fraction of \(N\). If NULL
, all groups are of size \(N / K\). Default is NULL
.
options include
"iid"
for \(iid\) errors.
"AR"
for an \(AR(1)\) error process with an autoregressive coefficient of 0.5.
Default is "iid"
.
a \(p \times K\) matrix of location parameters of a logistic distribution function used to construct the time-varying coefficients. If left empty, the location parameters are drawn randomly. Default is NULL
.
a \(p \times K\) matrix of scale parameters of a logistic distribution function used to construct the time-varying coefficients. If left empty, the location parameters are drawn randomly. Default is NULL
.
a \(p \times d \times K\) array of coefficients for a the polynomials used to construct the time-varying coefficients. If left empty, the location parameters are drawn randomly. Default is NULL
.
standard deviation of the cross-sectional errors. Default is 1.
Paul Haimerl
The scalar dependent variable \(y_{it}\) is generated according to the following time-varying grouped panel data model
$$y_{it} = \gamma_i + \beta^\prime_{it} x_{it} + u_{it}, \quad i = 1, \dots, N, \; t = 1, \dots, T,$$
where \(\gamma_i\) is an individual fixed effect and \(x_{it}\) is a \(p \times 1\) vector of explanatory variables.
The coefficient vector \(\beta_i = \{\beta_{i1}^\prime, \dots, \beta_{iT}^\prime \}^\prime\) is subject to the group pattern
$$\beta_i \left( \frac{t}{T} \right) = \sum_{k = 1}^K \alpha_k \left( \frac{t}{T} \right) \bold{1} \{i \in G_k \},$$
with \(\cup_{k = 1}^K G_k = \{1, \dots, N\}\), \(G_k \cap G_j = \emptyset\) and \(\sup_{v \in [0,1]} \left( \| \alpha_k(v) - \alpha_j(v) \| \right) \neq 0\) for any \(k \neq j\), \(k = 1, \dots, K\). The total number of groups \(K\) is determined by n_groups
.
The predictors are simulated as: $$x_{it,j} = 0.2 \gamma_i + e_{it,j}, \quad \gamma_i,e_{it,j} \sim i.i.d. N(0, 1), \quad j = \{1, \dots, p\},$$ where \(e_{it,j}\) denotes a series of innovations. \(\gamma_i\) and \(e_i\) are independent of each other.
The errors \(u_{it}\) feature a \(iid\) standard normal distribution.
In case locations = NULL
, the location parameters are drawn from \(\sim U[0.3, 0.9]\).
In case scales = NULL
, the scale parameters are drawn from \(\sim U[0.01, 0.09]\).
In case polynomial_coef = NULL
, the polynomial coefficients are drawn from \(\sim U[-20, 20]\) and normalized so that all coefficients of one polynomial sum up to 1.
The final coefficient function follows as \(\alpha_k (t/T) = 3 * F(t/T, location, scale) + \sum_{j=1}^d a_j (t/T)^j\), where \(F(\cdot, location, scale)\) denotes a cumulative logistic distribution function and \(a_j\) reflects a polynomial coefficient.
# Simulate a time-varying panel subject to a time trend and a group structure
set.seed(1)
sim <- sim_tv_DGP(N = 20, n_periods = 50, p = 1)
y <- sim$y
X <- sim$X
Run the code above in your browser using DataLab