fetwfeWithSimulatedData: Run FETWFE on Simulated Data

Description

This function runs the fused extended two-way fixed effects estimator (fetwfe()) on simulated data. It is simply a wrapper for fetwfe(): it accepts an object of class "FETWFE_simulated" (produced by simulateData()) and unpacks the necessary components to pass to fetwfe(). So the outputs match fetwfe(), and the needed inputs match their counterparts in fetwfe().

Usage

fetwfeWithSimulatedData(
  simulated_obj,
  lambda.max = NA,
  lambda.min = NA,
  nlambda = 100,
  q = 0.5,
  verbose = FALSE,
  alpha = 0.05,
  add_ridge = FALSE
)

Value

An object of class fetwfe containing the following elements:

att_hat

The estimated overall average treatment effect for a randomly selected treated unit.

att_se

If q < 1, a standard error for the ATT. If indep_counts was provided, this standard error is asymptotically exact; if not, it is asymptotically conservative. If q >= 1, this will be NA.

catt_hats

A named vector containing the estimated average treatment effects for each cohort.

catt_ses

If q < 1, a named vector containing the (asymptotically exact, non-conservative) standard errors for the estimated average treatment effects within each cohort.

cohort_probs

A vector of the estimated probabilities of being in each cohort conditional on being treated, which was used in calculating att_hat. If indep_counts was provided, cohort_probs was calculated from that; otherwise, it was calculated from the counts of units in each treated cohort in pdata.

catt_df

A dataframe displaying the cohort names, average treatment effects, standard errors, and 1 - alpha confidence interval bounds.

beta_hat

The full vector of estimated coefficients.

treat_inds

The indices of beta_hat corresponding to the treatment effects for each cohort at each time.

treat_int_inds

The indices of beta_hat corresponding to the interactions between the treatment effects for each cohort at each time and the covariates.

sig_eps_sq

Either the provided sig_eps_sq or the estimated one, if a value wasn't provided.

sig_eps_c_sq

Either the provided sig_eps_c_sq or the estimated one, if a value wasn't provided.

lambda.max

Either the provided lambda.max or the one that was used, if a value wasn't provided. (This is returned to help with getting a reasonable range of lambda values for grid search.)

lambda.max_model_size

The size of the selected model corresponding to lambda.max (for q <= 1, this will be the smallest model size). As mentioned above, for q <= 1 ideally this value is close to 0.

lambda.min

Either the provided lambda.min or the one that was used, if a value wasn't provided.

lambda.min_model_size

The size of the selected model corresponding to lambda.min (for q <= 1, this will be the largest model size). As mentioned above, for q <= 1 ideally this value is close to p.

lambda_star

The value of lambda chosen by BIC. If this value is close to lambda.min or lambda.max, that could suggest that the range of lambda values should be expanded.

lambda_star_model_size

The size of the model that was selected. If this value is close to lambda.max_model_size or lambda.min_model_size, that could suggest that the range of lambda values should be expanded.

N

The final number of units that were in the data set used for estimation (after any units may have been removed because they were treated in the first time period).

T

The number of time periods in the final data set.

R

The final number of treated cohorts that appear in the final data set.

d

The final number of covariates that appear in the final data set (after any covariates may have been removed because they contained missing values or all contained the same value for every unit).

p

The final number of columns in the full set of covariates used to estimate the model.

alpha

The alpha level used for confidence intervals.

internal

A list containing internal outputs that are typically not needed for interpretation:

X_ints: The design matrix created containing all interactions, time and cohort dummies, etc.

y

The vector of responses, containing nrow(X_ints) entries.

X_final

The design matrix after applying the change in coordinates to fit the model and also multiplying on the left by the square root inverse of the estimated covariance matrix for each unit.

y_final

The final response after multiplying on the left by the square root inverse of the estimated covariance matrix for each unit.

calc_ses

Logical indicating whether standard errors were calculated.

The object has methods for print(), summary(), and coef(). By default, print() and summary() only show the essential outputs. To see internal details, use print(x, show_internal = TRUE) or summary(x, show_internal = TRUE). The coef() method returns the vector of estimated coefficients (beta_hat).

Arguments

simulated_obj: An object of class "FETWFE_simulated" containing the simulated panel data and design matrix.
lambda.max: (Optional.) Numeric. A penalty parameter lambda will be selected over a grid search by BIC in order to select a single model. The largest lambda in the grid will be lambda.max. If no lambda.max is provided, one will be selected automatically. For lambda <= 1, the model will be sparse, and ideally all of the following are true at once: the smallest model (the one corresponding to lambda.max) selects close to 0 features, the largest model (the one corresponding to lambda.min) selects close to p features, nlambda is large enough so that models are considered at every feasible model size, and nlambda is small enough so that the computation doesn't become infeasible. You may want to manually tweak lambda.max, lambda.min, and nlambda to try to achieve these goals, particularly if the selected model size is very close to the model corresponding to lambda.max or lambda.min, which could indicate that the range of lambda values was too narrow. You can use the function outputs lambda.max_model_size, lambda.min_model_size, and lambda_star_model_size to try to assess this. Default is NA.
lambda.min: (Optional.) Numeric. The smallest lambda penalty parameter that will be considered. See the description of lambda.max for details. Default is NA.
nlambda: (Optional.) Integer. The total number of lambda penalty parameters that will be considered. See the description of lambda.max for details. Default is 100.
q: (Optional.) Numeric; determines what L_q penalty is used for the fusion regularization. q = 1 is the lasso, and for 0 < q < 1, it is possible to get standard errors and confidence intervals. q = 2 is ridge regression. See Faletto (2025) for details. Default is 0.5.
verbose: Logical; if TRUE, more details on the progress of the function will be printed as the function executes. Default is FALSE.
alpha: Numeric; function will calculate (1 - alpha) confidence intervals for the cohort average treatment effects that will be returned in catt_df.
add_ridge: (Optional.) Logical; if TRUE, adds a small amount of ridge regularization to the (untransformed) coefficients to stabilize estimation. Default is FALSE.

Examples

Run this code

if (FALSE) {
  # Generate coefficients
  coefs <- genCoefs(R = 5, T = 30, d = 12, density = 0.1, eff_size = 2, seed = 123)

  # Simulate data using the coefficients
  sim_data <- simulateData(coefs, N = 120, sig_eps_sq = 5, sig_eps_c_sq = 5)

  result <- fetwfeWithSimulatedData(sim_data)
}

Run the code above in your browser using DataLab