Learn R Programming

fetwfe (version 1.5.0)

simulateDataCore: Generate Random Panel Data for FETWFE Simulations

Description

Generates a random panel data set for simulation studies of the fused extended two-way fixed effects (FETWFE) estimator. The function creates a balanced panel with \(N\) units over \(T\) time periods, assigns treatment status across \(R\) treated cohorts (with equal marginal probabilities for treatment and non-treatment), and constructs a design matrix along with the corresponding outcome. When gen_ints = TRUE the full design matrix is returned (including interactions between covariates and fixed effects and treatment indicators). When gen_ints = FALSE the design matrix is generated in a simpler format (with no interactions) as expected by fetwfe(). Moreover, the covariates are generated according to the specified distribution: by default, covariates are drawn from a normal distribution; if distribution = "uniform", they are drawn uniformly from \([-\sqrt{3}, \sqrt{3}]\).

When \(d = 0\) (i.e. no covariates), no covariate-related columns or interactions are generated.

See the simulation studies section of Faletto (2025) for details.

Usage

simulateDataCore(
  N,
  T,
  R,
  d,
  sig_eps_sq,
  sig_eps_c_sq,
  beta,
  seed = NULL,
  gen_ints = FALSE,
  distribution = "gaussian",
  guarantee_rank_condition = FALSE
)

Value

An object of class "FETWFE_simulated", which is a list containing:

pdata

A dataframe containing generated data that can be passed to fetwfe().

X

The design matrix. When gen_ints = TRUE, \(X\) has \(p\) columns with interactions; when gen_ints = FALSE, \(X\) has no interactions.

y

A numeric vector of length \(N \times T\) containing the generated responses.

covs

A character vector containing the names of the generated features (if \(d > 0\)), or simply an empty vector (if \(d = 0\))

time_var

The name of the time variable in pdata

unit_var

The name of the unit variable in pdata

treatment

The name of the treatment variable in pdata

response

The name of the response variable in pdata

coefs

The coefficient vector \(\beta\) used for data generation.

first_inds

A vector of indices indicating the first treatment effect for each treated cohort.

N_UNTREATED

The number of never-treated units.

assignments

A vector of counts (of length \(R+1\)) indicating how many units fall into the never-treated group and each of the \(R\) treated cohorts.

indep_counts

Independent cohort assignments (for auxiliary purposes).

p

The number of columns in the design matrix \(X\).

N

Number of units.

T

Number of time periods.

R

Number of treated cohorts.

d

Number of covariates.

sig_eps_sq

The idiosyncratic noise variance.

sig_eps_c_sq

The unit-level noise variance.

Arguments

N

Integer. Number of units in the panel.

T

Integer. Number of time periods.

R

Integer. Number of treated cohorts (with treatment starting in periods 2 to T).

d

Integer. Number of time-invariant covariates.

sig_eps_sq

Numeric. Variance of the idiosyncratic (observation-level) noise.

sig_eps_c_sq

Numeric. Variance of the unit-level random effects.

beta

Numeric vector. Coefficient vector for data generation. Its required length depends on the value of gen_ints:

  • If gen_ints = TRUE and d > 0, the expected length is \(p = R + (T-1) + d + dR + d(T-1) + num\_treats + num\_treats \times d\), where \(num\_treats = T \times R - \frac{R(R+1)}{2}\).

  • If gen_ints = TRUE and d = 0, the expected length is \(p = R + (T-1) + num\_treats\).

  • If gen_ints = FALSE, the expected length is \(p = R + (T-1) + d + num\_treats\).

seed

(Optional) Integer. Seed for reproducibility.

gen_ints

Logical. If TRUE, generate the full design matrix with interactions; if FALSE (the default), generate a design matrix without any interaction terms.

distribution

Character. Distribution to generate covariates. Defaults to "gaussian". If set to "uniform", covariates are drawn uniformly from \([-\sqrt{3}, \sqrt{3}]\).

guarantee_rank_condition

(Optional). Logical. If TRUE, the returned data set is guaranteed to have at least d + 1 units per cohort, which is necessary for the final design matrix to have full column rank. Default is FALSE, in which case no such condition is enforced.

Details

When gen_ints = TRUE, the function constructs the design matrix by first generating base fixed effects and a long-format covariate matrix (via generateBaseEffects()), then appending interactions between the covariates and cohort/time fixed effects (via generateFEInts()) and finally treatment indicator columns and treatment-covariate interactions (via genTreatVarsSim() and genTreatInts()). When gen_ints = FALSE, the design matrix consists only of the base fixed effects, covariates, and treatment indicators.

The argument distribution controls the generation of covariates. For "gaussian", covariates are drawn from rnorm; for "uniform", they are drawn from runif on the interval \([-\sqrt{3}, \sqrt{3}]\).

When \(d = 0\) (i.e. no covariates), the function omits any covariate-related columns and their interactions.

References

Faletto, G (2025). Fused Extended Two-Way Fixed Effects for Difference-in-Differences with Staggered Adoptions. arXiv preprint arXiv:2312.05985. https://arxiv.org/abs/2312.05985.