Learn R Programming

cdid: The Chained Difference-in-Differences

The cdid package extends the popular did library by Brantly Callaway to improve efficiency and handle unbalanced panel data in staggered treatment designs. It implements the methodology introduced in:

Bellego, Benatia, and Dortet-Bernadet (2024), "The Chained Difference-in-Differences", Journal of Econometrics. https://doi.org/10.1016/j.jeconom.2024.105783

Features

  • Enhanced Efficiency: Offers greater precision for balanced panels (smaller standard errors)
  • Handles Missing Data: Offers greater precision for unbalanced panels, especially when errors are serially correlated.
  • Reduced Bias: Less prone to bias when missingness is tied to unobservable heterogeneity.
  • Flexible Control Groups: Supports control groups comprising either "nevertreated" or "notyettreated" units, like in the did library.
  • Customizable Weighting: Implements aggregation with two weighting matrix options:
    • Identity: Best for smaller datasets with less overidentifying restrictions (many missings like in rotating panel surveys)
    • Two-step: Best for larger datasets with many overidentifying restrictions (few missings).

Future developments: generalized attrition model (MAR & sequential MAR), doubly-robust estimator (so far only ipw is implemented), and improved computational efficiency. Hopefully, it will be directly integrated within the did library.

Installation

The cdid package can be installed from CRAN using:

install.packages("cdid")

Alternatively, for the development version:

remotes::install_github("joelcuerrier/cdid", ref = "main", build_vignettes = TRUE, force = TRUE)

Getting Started

Example Usage with Simulated Data

library(did) #for comparison
library(cdid)

set.seed(123)

# Generate a balanced dataset with unit fixed-effects
# The true values of the coefficients are based on time-to-treatment. The treatment
# effect is zero before the treatment, 1.75 one period after, 1.5 two period after,
# 1.25 three period after, 1 four period after, 0.75 five period after, 0.5 six  
# period after, etc.

data0 <- fonction_simu_attrition(
  N = 500, T = 8,
  theta2_alpha_Gg = 0.5, lambda1_alpha_St = 0,
  sigma_alpha = 2, sigma_epsilon = 0.1, tprob = 0.5
)

# Ensure all observations are included for a balanced panel
data0$S <- 1

# Run the original `did` library estimation
did.results = did:: att_gt(
  yname="Y",
  tname="date",
  idname = "id",
  gname = "date_G",
  xformla = ~X,
  data = data0,
  weightsname = NULL,
  allow_unbalanced_panel = FALSE,
  panel = TRUE,
  control_group = "notyettreated",
  alp = 0.05,
  bstrap = TRUE,
  cband = TRUE,
  biters = 1000,
  clustervars = NULL,
  est_method = "ipw",
  base_period = "varying",
  print_details = FALSE,
  pl = FALSE,
  cores = 1
)

# Run `cdid` with 2-step weighting matrix
result_2step = att_gt_cdid(yname="Y", tname="date",
                         idname="id",
                         gname="date_G",
                         xformla=~X,
                         data=data0,
                         control_group="notyettreated",
                         alp=0.05,
                         bstrap=TRUE,
                         biters=1000,
                         clustervars=NULL,
                         cband=TRUE,
                         est_method="2-step",
                         base_period="varying",
                         print_details=FALSE,
                         pl=FALSE,
                         cores=1)

# Run `cdid` with identity weighting matrix
result_id = att_gt_cdid(yname="Y", tname="date",
                        idname="id",
                        gname="date_G",
                        xformla=~X,
                        data=data0,
                        control_group="notyettreated",
                        alp=0.05,
                        bstrap=TRUE,
                        biters=1000,
                        clustervars=NULL,
                        cband=TRUE,
                        est_method="Identity",
                        base_period="varying",
                        print_details=FALSE,
                        pl=FALSE,
                        cores=1)

# Print results
print(did.results)
print(result_2step)
print(result_id)

Aggregating Results

After computing the group-time ATT estimates, aggregate results can be obtained using functions from did library

agg.es.did <- aggte(MP = did.results, type = 'dynamic')
agg.es.2step <- aggte(MP = result_2step, type = 'dynamic')
agg.es.id <- aggte(MP = result_id, type = 'dynamic')

# Print aggregate results
print(agg.es.did)
print(agg.es.2step)
print(agg.es.id)

Working with Unbalanced Panels

The cdid library excels with unbalanced panels. Here's an example:

# Generate a dataset with missing observations based on sampling indicator S
data0 <- fonction_simu_attrition(
  N = 500, T = 8,
  theta2_alpha_Gg = 0.5, lambda1_alpha_St = 0,
  sigma_alpha = 2, sigma_epsilon = 0.1, tprob = 0.5
)

# Keep only non-missing (S==1)
data0 <- data0[data0$S == 1, ]

# Run estimations as before, but specify panel = FALSE for did::att_gt()
did.results = did:: att_gt(
  yname="Y",
  tname="date",
  idname = "id",
  gname = "date_G",
  xformla = ~X,
  data = data0,
  weightsname = NULL,
  allow_unbalanced_panel = FALSE,
  panel = FALSE,
  control_group = "notyettreated",
  alp = 0.05,
  bstrap = TRUE,
  cband = TRUE,
  biters = 1000,
  clustervars = NULL,
  est_method = "ipw",
  base_period = "varying",
  print_details = FALSE,
  pl = FALSE,
  cores = 1
)

#For cdid, there is no difference
result_2step = att_gt_cdid(yname="Y", tname="date",
                         idname="id",
                         gname="date_G",
                         xformla=~X,
                         data=data0,
                         control_group="notyettreated",
                         alp=0.05,
                         bstrap=TRUE,
                         biters=1000,
                         clustervars=NULL,
                         cband=TRUE,
                         est_method="2-step",
                         base_period="varying",
                         print_details=FALSE,
                         pl=FALSE,
                         cores=1)

result_id = att_gt_cdid(yname="Y", tname="date",
                        idname="id",
                        gname="date_G",
                        xformla=~X,
                        data=data0,
                        control_group="notyettreated",
                        alp=0.05,
                        bstrap=TRUE,
                        biters=1000,
                        clustervars=NULL,
                        cband=TRUE,
                        est_method="Identity",
                        base_period="varying",
                        print_details=FALSE,
                        pl=FALSE,
                        cores=1)

Documentation

Complete documentation and detailed examples are available through the package's help pages:

?cdid
browseVignettes("cdid")

and a dedicated webpage: https://www.davidbenatia.com/projects/cdid-library/.

References

Bellego, C., Benatia, D., and Dortet-Bernadet, V. (2024). The Chained Difference-in-Differences. Journal of Econometrics. https://doi.org/10.1016/j.jeconom.2024.105783

Callaway, B., & Sant'Anna, P. H. C. (2021). Difference-in-Differences with Multiple Time Periods. Journal of Econometrics. https://doi.org/10.1016/j.jeconom.2020.12.001

License

This package is licensed under the GPL-2 license.

Copy Link

Version

Install

install.packages('cdid')

Monthly Downloads

568

Version

0.1.0

License

GPL-2

Issues

Pull Requests

Stars

Forks

Maintainer

David Benatia

Last Published

January 8th, 2025

Functions in cdid (0.1.0)

fonction_simu_attrition

Simulate Unbalanced Panel Data
gmm_compute_delta_att

GMM_compute_delta_att
pre_process_cdid

Process cdid Function Arguments
MP

MP
att_gt_cdid

att_gt_cdid
print.MP

print.MP
gg

gg
DIDparams

DIDparams
gmm_convert_result

GMM_convert_result
gmm_convert_delta_to_att

GMM_convert_delta_to_att
summary.MP

summary.MP
process_attgt_gmm

Process Results