genCoefsCore: Generate Coefficient Vector for Data Generation

Description

This function generates a coefficient vector beta along with a sparse auxiliary vector theta for simulation studies of the fused extended two-way fixed effects estimator. The returned beta is formatted to align with the design matrix created by genRandomData(), and is a valid input for the beta argument of that function. The vector theta is sparse, with nonzero entries occurring with probability density and scaled by eff_size. See the simulation studies section of Faletto (2025) for details.

Usage

genCoefsCore(R, T, d, density, eff_size, seed = NULL)

Value

A list with two elements:

beta: A numeric vector representing the full coefficient vector after the inverse fusion transform.
theta: A numeric vector representing the coefficient vector in the transformed feature space. theta is a sparse vector, which aligns with an assumption that deviations from the restrictions encoded in the FETWFE model are sparse. beta is derived from theta.

Arguments

R: Integer. The number of treated cohorts (treatment is assumed to start in periods 2 to R + 1).
T: Integer. The total number of time periods.
d: Integer. The number of time-invariant covariates. If d > 0, additional terms corresponding to covariate main effects and interactions are included in beta.
density: Numeric in (0,1). The probability that any given entry in the initial sparse coefficient vector theta is nonzero.
eff_size: Numeric. The magnitude used to scale nonzero entries in theta. Each nonzero entry is set to eff_size or -eff_size (with a 60 percent chance for a positive value).
seed: (Optional) Integer. Seed for reproducibility.

Details

The length of beta is given by $$p = R + (T - 1) + d + dR + d(T - 1) + \mathit{num\_treats} + (\mathit{num\_treats} \times d)$$, where the number of treatment parameters is defined as $$\mathit{num\_treats} = T \times R - \frac{R(R+1)}{2}$$.

The function operates in two steps:

It first creates a sparse vector theta of length $p$, with nonzero entries occurring with probability density. Nonzero entries are set to eff_size or -eff_size (with a 60\
The full coefficient vector beta is then computed by applying an inverse fusion transform to theta using internal routines (e.g., genBackwardsInvFusionTransformMat() and genInvTwoWayFusionTransformMat()).

References

Faletto, G (2025). Fused Extended Two-Way Fixed Effects for Difference-in-Differences with Staggered Adoptions. arXiv preprint arXiv:2312.05985. https://arxiv.org/abs/2312.05985.

Examples

Run this code

if (FALSE) {
  # Set parameters for the coefficient generation
  R <- 3         # Number of treated cohorts
  T <- 6         # Total number of time periods
  d <- 2         # Number of covariates
  density <- 0.1 # Probability that an entry in the initial vector is nonzero
  eff_size <- 1.5  # Scaling factor for nonzero coefficients
  seed <- 789    # Seed for reproducibility

  # Generate coefficients using genCoefsCore()
  coefs_core <- genCoefsCore(R = R, T = T, d = d, density = density,
  eff_size = eff_size, seed = seed)
  beta <- coefs_core$beta
  theta <- coefs_core$theta

  # For diagnostic purposes, compute the expected length of beta.
  # The length p is defined internally as:
  #   p = R + (T - 1) + d + d*R + d*(T - 1) + num_treats + num_treats*d,
  # where num_treats = T * R - (R*(R+1))/2.
  num_treats <- T * R - (R * (R + 1)) / 2
  p_expected <- R + (T - 1) + d + d * R + d * (T - 1) + num_treats + num_treats * d

  cat("Length of beta:", length(beta), "\nExpected length:", p_expected, "\n")
}

Run the code above in your browser using DataLab