fixes
Overview
Note
By default, thefixes
package assumes time is a regularly spaced numeric variable (e.g., year = 1995, 1996, …).
However, if your time variable is irregular or non-numeric (e.g.,Date
type), you can enabletime_transform = TRUE
to automatically convert it to a sequential index within each unit.
You can also specify unit-specific treatment timing by settingstaggered = TRUE
.
The fixes
package is designed for conducting analysis and creating
plots for event studies, a method used to verify the parallel trends
assumption in two-way fixed effects (TWFE) difference-in-differences
(DID) analysis.
The package includes two main functions:
run_es()
: Accepts a data frame, generates lead and lag variables, and performs event study analysis. The function returns the results as a tidy data frame. Supports options for fixed effects, covariates, clustered standard errors, and staggered treatment timing.plot_es()
: Creates plots usingggplot2
based on the data frame generated byrun_es()
. Users can choose between a plot withgeom_ribbon()
orgeom_errorbar()
to visualize the results.
Installation
You can install the package like so:
# install.packages("pak")
pak::pak("fixes")
or
install.packages("fixes")
If you want to install development version, please install from GitHub repository:
pak::pak("yo5uke/fixes")
How to use
First, load the library.
library(fixes)
Data frame
The run_es()
function is designed to work with panel data.
The data frame must include the following variables:
- A unit identifier (e.g., individual, firm, region)
- A treatment indicator variable (0/1 or TRUE/FALSE)
- A time variable (numeric or
Date
) - An outcome variable (continuous)
In addition, if you use staggered = TRUE
, you must provide a variable
that indicates unit-specific treatment timing (e.g., the year
treatment started for each unit).
To get started, you can use example data from widely used packages:
did::sim_dt()
: A simulated panel dataset commonly used in difference-in-differences tutorials.fixest::base_stagg
: A built-in dataset designed for analyzing staggered adoption of treatment.
These datasets already contain the necessary structure and can be used
directly with run_es()
.
# Load example data
df1 <- fixest::base_did # Basic DID example
df2 <- fixest::base_stagg # Staggered treatment example
y | x1 | id | period | post | treat |
---|---|---|---|---|---|
2.8753063 | 0.5365377 | 1 | 1 | 0 | 1 |
1.8606527 | -3.0431894 | 1 | 2 | 0 | 1 |
0.0941652 | 5.5768439 | 1 | 3 | 0 | 1 |
3.7814749 | -2.8300587 | 1 | 4 | 0 | 1 |
-2.5581996 | -5.0443544 | 1 | 5 | 0 | 1 |
1.7287324 | -0.6363849 | 1 | 6 | 1 | 1 |
id | year | year_treated | time_to_treatment | treated | treatment_effect_true | x1 | y | |
---|---|---|---|---|---|---|---|---|
2 | 90 | 1 | 2 | -1 | 1 | 0 | -1.0947021 | 0.0172297 |
3 | 89 | 1 | 3 | -2 | 1 | 0 | -3.7100676 | -4.5808453 |
4 | 88 | 1 | 4 | -3 | 1 | 0 | 2.5274402 | 2.7381717 |
5 | 87 | 1 | 5 | -4 | 1 | 0 | -0.7204263 | -0.6510307 |
6 | 86 | 1 | 6 | -5 | 1 | 0 | -3.6711678 | -5.3338166 |
7 | 85 | 1 | 7 | -6 | 1 | 0 | -0.3152137 | 0.4956263 |
run_es()
run_es()
takes 14 arguments, including required variables and optional
specifications like fixed effects, clustering, covariates, staggered
treatment timing, and weights.
Argument | Description |
---|---|
data | Data frame to be used. |
outcome | Outcome variable. Can be specified as a raw variable or a transformation (e.g., log(y) ). Provide it unquoted. |
treatment | Dummy variable indicating the treated units. Provide it unquoted. Accepts both 0/1 and TRUE/FALSE . |
time | Time variable. Provide it unquoted. |
staggered | Logical. If TRUE , allows for unit-specific treatment timing (staggered adoption). Default is FALSE . |
timing | The time at which the treatment occurs. If staggered = FALSE , this should be a scalar (e.g., 2005 ). If staggered = TRUE , provide a variable (column) indicating the treatment time for each unit. |
lead_range | Number of pre-treatment periods to include (e.g., 3 = lead3 , lead2 , lead1 ). Default is NULL , which automatically uses the maximum available lead range. Set to a number to restrict the range manually. |
lag_range | Number of post-treatment periods to include (e.g., 2 = lag0 (the treatment period), lag1 , lag2 ). Default is NULL , which automatically uses the maximum available lag range. Set to a number to restrict the range manually. |
covariates | Additional covariates to include in the regression. Must be a one-sided formula (e.g., ~ x1 + x2 ). |
fe | Fixed effects to control for unobserved heterogeneity. Must be a one-sided formula (e.g., ~ id + year ). |
cluster | Specifies clustering for standard errors. Can be a character vector (e.g., c("id", "year") ) or a formula (e.g., ~ id + year , ~ id^year ). |
weights | Optional weights to be used in the regression. Provide as a one-sided formula (e.g., ~ weight ). |
baseline | Relative time value to be used as the reference category. The corresponding dummy is excluded from the regression. Must be within the specified lead/lag range. |
interval | Time interval between observations (e.g., 1 for yearly data, 5 for 5-year intervals). |
time_transform | Logical. If TRUE , converts the time variable into a sequential index (1, 2, 3, …) within each unit. Useful when time is irregular, such as with Date values or unbalanced panels (e.g., missing years or monthly observations). Default is FALSE . |
unit | Required if time_transform = TRUE . Specifies the panel unit identifier (e.g., firm_id ). |
Example: Without Covariates
event_study <- run_es(
data = df1,
outcome = y,
treatment = treat,
time = period,
timing = 6,
lead_range = 5,
lag_range = 4,
fe = ~ id + period,
cluster = ~ id,
baseline = -1,
interval = 1
)
Note: The fe
argument must be specified as a one-sided formula
(e.g., ~ firm_id + year
).
The cluster
argument can be specified either as a one-sided formula
(e.g., ~ state_id
) or as a character vector (e.g.,
c("firm_id", "year")
).
The run_es()
function returns a tidy data frame that includes
estimated event-study coefficients, confidence intervals, relative
timing values, and an indicator for the omitted baseline period.
Estimation is performed using fast and flexible fixed effects
regression.
Example: With Covariates
If your dataset includes additional covariates, you can include them in
the regression by specifying a one-sided formula using the covariates
argument, as shown below.
event_study <- run_es(
data = df1,
outcome = y,
treatment = treat,
time = period,
timing = 6,
lead_range = 5,
lag_range = 4,
covariates = ~ cov1 + cov2 + cov3,
fe = ~ id + period,
cluster = ~ id,
baseline = -1,
interval = 1
)
# Example using Date-type time variable and time_transform
df_alt <- df1 |>
dplyr::mutate(
year = rep(2001:2010, times = 108), # 108 units × 10 periods
date = as.Date(paste0(year, "-01-01"))
)
event_study_alt <- run_es(
data = df_alt,
outcome = y,
treatment = treat,
time = date,
timing = 19, # Corresponds to 19th time point in each unit
lead_range = 3,
lag_range = 3,
fe = ~ id + period,
cluster = ~ id,
baseline = -1,
time_transform = TRUE,
unit = id
)
Note:
Whentime_transform = TRUE
, thetiming
argument must be specified using the transformed index (e.g.,timing = 19
for the 19th time point within each unit).
Support for specifying the original time values (e.g., a specificDate
) directly astiming
is planned for a future update.
Currently,time_transform = TRUE
cannot be combined withstaggered = TRUE
. This combination is not yet supported, but may be implemented in a future release.
You can use this result to create custom plots, or take advantage of the
built-in plot_es()
function to visualize the estimates and confidence
intervals with minimal code.
plot_es()
The plot_es()
function creates a plot based on ggplot2
.
plot_es()
has 12 arguments.
Arguments | Description |
---|---|
data | Data frame created by run_es() |
type | The type of confidence interval visualization: “ribbon” (default) or “errorbar” |
vline_val | The x-intercept for the vertical reference line (default: 0) |
vline_color | Color for the vertical reference line (default: “#000”) |
hline_val | The y-intercept for the horizontal reference line (default: 0) |
hline_color | Color for the horizontal reference line (default: “#000”) |
linewidth | The width of the lines for the plot (default: 1) |
pointsize | The size of the points for the estimates (default: 2) |
alpha | The transparency level for ribbons (default: 0.2) |
barwidth | The width of the error bars (default: 0.2) |
color | The color for the lines and points (default: “#B25D91FF”) |
fill | The fill color for ribbons (default: “#B25D91FF”). |
If you don’t care about the details, you can just pass the data frame
created with run_es()
and the plot will be complete.
plot_es(event_study)
plot_es(event_study, type = "errorbar")
plot_es(event_study, type = "errorbar", vline_val = -.5)
Since it is created on a ggplot2
basis, it is possible to modify minor
details.
plot_es(event_study, type = "errorbar") +
ggplot2::scale_x_continuous(breaks = seq(-5, 5, by = 1)) +
ggplot2::ggtitle("Result of Event Study")
Planned Features
- Support for
staggered = TRUE
withtime_transform = TRUE
- Enable automatic alignment of treatment dates with transformed time
indices, allowing analysis with irregular time variables (e.g.,
Date
) in staggered adoption settings.
- Enable automatic alignment of treatment dates with transformed time
indices, allowing analysis with irregular time variables (e.g.,
- Allow
timing
to accept original time values (e.g., specificDate
s)- Instead of manually calculating the time index (e.g.,
timing = 19
), users will be able to specify aDate
or other original time value directly. This will simplify workflow whentime_transform = TRUE
.
- Instead of manually calculating the time index (e.g.,
Debugging
If you find an issue, please report it on the GitHub Issues page.