fixes

Overview

Note
By default, the fixes package assumes time is a regularly spaced numeric variable (e.g., year = 1995, 1996, …).
However, if your time variable is irregular or non-numeric (e.g., Date type), you can enable time_transform = TRUE to automatically convert it to a sequential index within each unit.
You can also specify unit-specific treatment timing by setting staggered = TRUE.

The fixes package is designed for conducting analysis and creating plots for event studies, a method used to verify the parallel trends assumption in two-way fixed effects (TWFE) difference-in-differences (DID) analysis.

The package includes two main functions:

run_es(): Accepts a data frame, generates lead and lag variables, and performs event study analysis. The function returns the results as a tidy data frame. Supports options for fixed effects, covariates, clustered standard errors, and staggered treatment timing.
plot_es(): Creates plots using ggplot2 based on the data frame generated by run_es(). Users can choose between a plot with geom_ribbon() or geom_errorbar() to visualize the results.

Installation

You can install the package like so:

# install.packages("pak")
pak::pak("fixes")

install.packages("fixes")

If you want to install development version, please install from GitHub repository:

pak::pak("yo5uke/fixes")

How to use

First, load the library.

library(fixes)

Data frame

The run_es() function is designed to work with panel data.
The data frame must include the following variables:

A unit identifier (e.g., individual, firm, region)
A treatment indicator variable (0/1 or TRUE/FALSE)
A time variable (numeric or Date)
An outcome variable (continuous)

In addition, if you use staggered = TRUE, you must provide a variable that indicates unit-specific treatment timing (e.g., the year treatment started for each unit).

To get started, you can use example data from widely used packages:

did::sim_dt(): A simulated panel dataset commonly used in difference-in-differences tutorials.
fixest::base_stagg: A built-in dataset designed for analyzing staggered adoption of treatment.

These datasets already contain the necessary structure and can be used directly with run_es().

# Load example data
df1 <- fixest::base_did      # Basic DID example
df2 <- fixest::base_stagg    # Staggered treatment example

y	x1	id	period	post	treat
2.8753063	0.5365377	1	1	0	1
1.8606527	-3.0431894	1	2	0	1
0.0941652	5.5768439	1	3	0	1
3.7814749	-2.8300587	1	4	0	1
-2.5581996	-5.0443544	1	5	0	1
1.7287324	-0.6363849	1	6	1	1

	id	year	year_treated	time_to_treatment	treated	x1	y
2	90	1	2	-1	1	-1.0947021	0.0172297
3	89	1	3	-2	1	-3.7100676	-4.5808453
4	88	1	4	-3	1	2.5274402	2.7381717
5	87	1	5	-4	1	-0.7204263	-0.6510307
6	86	1	6	-5	1	-3.6711678	-5.3338166
7	85	1	7	-6	1	-0.3152137	0.4956263

`run_es()`

run_es() takes 14 arguments, including required variables and optional specifications like fixed effects, clustering, covariates, staggered treatment timing, and weights.

Argument	Description
`data`	Data frame to be used.
`outcome`	Outcome variable. Can be specified as a raw variable or a transformation (e.g., `log(y)`). Provide it unquoted.
`treatment`	Dummy variable indicating the treated units. Provide it unquoted. Accepts both `0/1` and `TRUE/FALSE`.
`time`	Time variable. Provide it unquoted.
`staggered`	Logical. If `TRUE`, allows for unit-specific treatment timing (staggered adoption). Default is `FALSE`.
`timing`	The time at which the treatment occurs. If `staggered = FALSE`, this should be a scalar (e.g., `2005`). If `staggered = TRUE`, provide a variable (column) indicating the treatment time for each unit.
`lead_range`	Number of pre-treatment periods to include (e.g., 3 = `lead3`, `lead2`, `lead1`). Default is `NULL`, which automatically uses the maximum available lead range. Set to a number to restrict the range manually.
`lag_range`	Number of post-treatment periods to include (e.g., 2 = `lag0` (the treatment period), `lag1`, `lag2`). Default is `NULL`, which automatically uses the maximum available lag range. Set to a number to restrict the range manually.
`covariates`	Additional covariates to include in the regression. Must be a one-sided formula (e.g., `~ x1 + x2`).
`fe`	Fixed effects to control for unobserved heterogeneity. Must be a one-sided formula (e.g., `~ id + year`).
`cluster`	Specifies clustering for standard errors. Can be a character vector (e.g., `c("id", "year")`) or a formula (e.g., `~ id + year`, `~ id^year`).
`weights`	Optional weights to be used in the regression. Provide as a one-sided formula (e.g., `~ weight`).
`baseline`	Relative time value to be used as the reference category. The corresponding dummy is excluded from the regression. Must be within the specified lead/lag range.
`interval`	Time interval between observations (e.g., `1` for yearly data, `5` for 5-year intervals).
`time_transform`	Logical. If `TRUE`, converts the `time` variable into a sequential index (1, 2, 3, …) within each unit. Useful when time is irregular, such as with `Date` values or unbalanced panels (e.g., missing years or monthly observations). Default is `FALSE`.
`unit`	Required if `time_transform = TRUE`. Specifies the panel unit identifier (e.g., `firm_id`).

Example: Without Covariates

event_study <- run_es(
  data       = df1, 
  outcome    = y, 
  treatment  = treat, 
  time       = period, 
  timing     = 6, 
  lead_range = 5, 
  lag_range  = 4, 
  fe         = ~ id + period, 
  cluster    = ~ id, 
  baseline   = -1, 
  interval   = 1
)

Note: The fe argument must be specified as a one-sided formula (e.g., ~ firm_id + year).
The cluster argument can be specified either as a one-sided formula (e.g., ~ state_id) or as a character vector (e.g., c("firm_id", "year")).

The run_es() function returns a tidy data frame that includes estimated event-study coefficients, confidence intervals, relative timing values, and an indicator for the omitted baseline period.
Estimation is performed using fast and flexible fixed effects regression.

Example: With Covariates

If your dataset includes additional covariates, you can include them in the regression by specifying a one-sided formula using the covariates argument, as shown below.

event_study <- run_es(
  data       = df1, 
  outcome    = y, 
  treatment  = treat, 
  time       = period, 
  timing     = 6, 
  lead_range = 5, 
  lag_range  = 4, 
  covariates = ~ cov1 + cov2 + cov3, 
  fe         = ~ id + period, 
  cluster    = ~ id, 
  baseline   = -1, 
  interval   = 1
)

# Example using Date-type time variable and time_transform
df_alt <- df1 |>
  dplyr::mutate(
    year = rep(2001:2010, times = 108),  # 108 units × 10 periods
    date = as.Date(paste0(year, "-01-01"))
  )

event_study_alt <- run_es(
  data           = df_alt,
  outcome        = y,
  treatment      = treat,
  time           = date,
  timing         = 19,  # Corresponds to 19th time point in each unit
  lead_range     = 3,
  lag_range      = 3,
  fe             = ~ id + period,
  cluster        = ~ id,
  baseline       = -1,
  time_transform = TRUE,
  unit           = id
)

Note:
When time_transform = TRUE, the timing argument must be specified using the transformed index (e.g., timing = 19 for the 19th time point within each unit).
Support for specifying the original time values (e.g., a specific Date) directly as timing is planned for a future update.
Currently, time_transform = TRUE cannot be combined with staggered = TRUE. This combination is not yet supported, but may be implemented in a future release.

You can use this result to create custom plots, or take advantage of the built-in plot_es() function to visualize the estimates and confidence intervals with minimal code.

`plot_es()`

The plot_es() function creates a plot based on ggplot2.

plot_es() has 12 arguments.

Arguments	Description
data	Data frame created by `run_es()`
type	The type of confidence interval visualization: “ribbon” (default) or “errorbar”
vline_val	The x-intercept for the vertical reference line (default: 0)
vline_color	Color for the vertical reference line (default: “#000”)
hline_val	The y-intercept for the horizontal reference line (default: 0)
hline_color	Color for the horizontal reference line (default: “#000”)
linewidth	The width of the lines for the plot (default: 1)
pointsize	The size of the points for the estimates (default: 2)
alpha	The transparency level for ribbons (default: 0.2)
barwidth	The width of the error bars (default: 0.2)
color	The color for the lines and points (default: “#B25D91FF”)
fill	The fill color for ribbons (default: “#B25D91FF”).

If you don’t care about the details, you can just pass the data frame created with run_es() and the plot will be complete.

plot_es(event_study)

plot_es(event_study, type = "errorbar")

plot_es(event_study, type = "errorbar", vline_val = -.5)

Since it is created on a ggplot2 basis, it is possible to modify minor details.

plot_es(event_study, type = "errorbar") + 
  ggplot2::scale_x_continuous(breaks = seq(-5, 5, by = 1)) + 
  ggplot2::ggtitle("Result of Event Study")

Planned Features

Support for staggered = TRUE with time_transform = TRUE
- Enable automatic alignment of treatment dates with transformed time indices, allowing analysis with irregular time variables (e.g., Date) in staggered adoption settings.
Allow timing to accept original time values (e.g., specific Dates)
- Instead of manually calculating the time index (e.g., timing = 19), users will be able to specify a Date or other original time value directly. This will simplify workflow when time_transform = TRUE.

Debugging

If you find an issue, please report it on the GitHub Issues page.

fixes

Overview

Installation

How to use

Data frame

`run_es()`

Example: Without Covariates

Example: With Covariates

`plot_es()`

Planned Features

Debugging

Copy Link

Version

Install

Monthly Downloads

Version

License

Issues

Pull Requests

Stars

Forks

Repository

Maintainer

Last Published

Functions in fixes (0.4.1)