Generates a random panel data set for simulation studies of the fused extended two-way fixed
effects (FETWFE) estimator. The function creates a balanced panel with \(N\) units over \(T\)
time periods, assigns treatment status across \(R\) treated cohorts (with equal marginal
probabilities for treatment and non-treatment), and constructs a design matrix along with the
corresponding outcome. When gen_ints = TRUE
the full design matrix is returned (including
interactions between covariates and fixed effects and treatment indicators). When
gen_ints = FALSE
the design matrix is generated in a simpler format (with no interactions)
as expected by fetwfe()
. Moreover, the covariates are generated according to the
specified distribution
: by default, covariates are drawn from a normal distribution;
if distribution = "uniform"
, they are drawn uniformly from \([-\sqrt{3}, \sqrt{3}]\).
When \(d = 0\) (i.e. no covariates), no covariate-related columns or interactions are generated.
See the simulation studies section of Faletto (2025) for details.
simulateDataCore(
N,
T,
R,
d,
sig_eps_sq,
sig_eps_c_sq,
beta,
seed = NULL,
gen_ints = FALSE,
distribution = "gaussian",
guarantee_rank_condition = FALSE
)
An object of class "FETWFE_simulated"
, which is a list containing:
A dataframe containing generated data that can be passed to fetwfe()
.
The design matrix. When gen_ints = TRUE
, \(X\) has \(p\) columns with
interactions; when gen_ints = FALSE
, \(X\) has no interactions.
A numeric vector of length \(N \times T\) containing the generated responses.
A character vector containing the names of the generated features (if \(d > 0\)), or simply an empty vector (if \(d = 0\))
The name of the time variable in pdata
The name of the unit variable in pdata
The name of the treatment variable in pdata
The name of the response variable in pdata
The coefficient vector \(\beta\) used for data generation.
A vector of indices indicating the first treatment effect for each treated cohort.
The number of never-treated units.
A vector of counts (of length \(R+1\)) indicating how many units fall into the never-treated group and each of the \(R\) treated cohorts.
Independent cohort assignments (for auxiliary purposes).
The number of columns in the design matrix \(X\).
Number of units.
Number of time periods.
Number of treated cohorts.
Number of covariates.
The idiosyncratic noise variance.
The unit-level noise variance.
Integer. Number of units in the panel.
Integer. Number of time periods.
Integer. Number of treated cohorts (with treatment starting in periods 2 to T).
Integer. Number of time-invariant covariates.
Numeric. Variance of the idiosyncratic (observation-level) noise.
Numeric. Variance of the unit-level random effects.
Numeric vector. Coefficient vector for data generation. Its required length depends
on the value of gen_ints
:
If gen_ints = TRUE
and d > 0
, the expected length is
\(p = R + (T-1) + d + dR + d(T-1) + num\_treats + num\_treats \times d\), where
\(num\_treats = T \times R - \frac{R(R+1)}{2}\).
If gen_ints = TRUE
and d = 0
, the expected length is
\(p = R + (T-1) + num\_treats\).
If gen_ints = FALSE
, the expected length is
\(p = R + (T-1) + d + num\_treats\).
(Optional) Integer. Seed for reproducibility.
Logical. If TRUE
, generate the full design matrix with interactions;
if FALSE
(the default), generate a design matrix without any interaction terms.
Character. Distribution to generate covariates.
Defaults to "gaussian"
. If set to "uniform"
, covariates are drawn uniformly
from \([-\sqrt{3}, \sqrt{3}]\).
(Optional). Logical. If TRUE, the returned
data set is guaranteed to have at least d + 1
units per cohort, which is
necessary for the final design matrix to have full column rank. Default is
FALSE, in which case no such condition is enforced.
When gen_ints = TRUE
, the function constructs the design matrix by first generating
base fixed effects and a long-format covariate matrix (via generateBaseEffects()
), then
appending interactions between the covariates and cohort/time fixed effects (via
generateFEInts()
) and finally treatment indicator columns and treatment-covariate
interactions (via genTreatVarsSim()
and genTreatInts()
). When
gen_ints = FALSE
, the design matrix consists only of the base fixed effects, covariates,
and treatment indicators.
The argument distribution
controls the generation of covariates. For
"gaussian"
, covariates are drawn from rnorm
; for "uniform"
,
they are drawn from runif
on the interval \([-\sqrt{3}, \sqrt{3}]\).
When \(d = 0\) (i.e. no covariates), the function omits any covariate-related columns and their interactions.
Faletto, G (2025). Fused Extended Two-Way Fixed Effects for Difference-in-Differences with Staggered Adoptions. arXiv preprint arXiv:2312.05985. https://arxiv.org/abs/2312.05985.