Fits a linear staggered difference-in-differences model, following the Abraham and Sun (2018) approach. It facilitates optional weighting and user-specified variance-covariance function.
sdid(
formula,
df,
weights = NULL,
cohort_var = NULL,
cohort_ref = NULL,
cohort_time_refs = NULL,
time_var = NULL,
time_ref = NULL,
intervention_var,
.vcov = stats::vcov,
...
)Returns an object of class sdid, which is a list containing the
following components:
mdl
: The lm object returned from the call to stats::lm() in sdid()
formula
: A list object containing both the original formula specified in the call to sdid() and the generated formula, with all cohort-time interactions, passed to stats::lm() to fit the model
vcov : The variance-covariance matrix used to estimate standard errors
tsi : The time-since-intervention dataset used to enumerate time periods relative to the intervention period for each cohort
obs_cnt
: Counts of observations within each cohort-time interaction
cohort
: A list object containing details about cohorts. var contains the name of the column in df that identifies cohorts; ref contains the value of the cohort column that functions as the referent for main effects; and time_refs contains the referent time values within each cohort for each set of cohort-time interactions.
time
: A list object containing var, which is the name of the column in df identified by the sdid() argument time_var, and ref, the referent value of time_var for main effects.
intervention_var
: Name of the column in df that contains the time period during which each cohort implemented the intervention of interest
covariates
: A character vector containing the terms in formula other than those corresponding to cohorts and time periods
An object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. The details of model specification are given under 'Details'.
A data frame containing the variables in the model.
An optional vector of weights to be passed to stats::lm() to
be used in the fitting process. Should be NULL or a numeric vector.
Name of the variable in df that contains cohort
assignments. If NULL, this is assumed to be the first column named in the
right hand side of formula.
Value of cohort_var that serves as the referent for main
effects for cohorts. If NULL, this is assumed to the be the first value in
the set of values for cohort_var.
A list, whose elements are named to match levels of
cohort_var, specifying the value of time_var that serves as the referent
for each time interaction with values of cohort_var. See 'Details.'
Name of the variable in df that contains time periods. If
NULL, this is assumed to be the second column named in the right hand side of
formula.
Value of time_var that serves as the referent for main
effects for time periods. If NULL, this is assumed to the be the first value
in the set of values for time_var.
Name of the cohort-level variable in df that
specifies which values in time_var correspond to the first
post-intervention time period for each cohort.
Function to be used to estimate the variance-covariance matrix. Defaults to stats::vcov.
Additional arguments to be passed to .vcov.
Fitting a staggered difference-in-differences model requires deliberate attention to two specific independent variables:
The intervention cohort column assigns a cohort name to all individuals or groups having the the intervention during the same time period. For example, if the longitudinal data is at the year level, ranging from 2010 to 2020, and it contains 15 counties, 3 of whom implemented the intervention of interest in 2015, those 3 counties would be assigned to the same cohort. Similarly, if 2 more counties implemented the intervention in 2016, those 2 counties would be assigned to the next cohort.
The time period column assigns each observation to a time period at the most granular level of the longitudinal data. In the example described above, these values would correspond to the years 2010, ..., 2020.
To specify a model, a formula is passed following the format response ~ cohort_var + time_var + covariates. This, however, is not the formula use to fit the model; sdid() expands this formula to include main effects and every possible interaction between cohort_var and time_var, excluding referents for identification:
Referents for main effects are either the first levels cohort_var and time_var or the referents specified in cohort_ref and time_ref.
Referents for cohort-time interactions are either the factor level of time_var that immediately precedes the value of intervention_var within each cohort or the referents specified in cohort_time_refs.
sdid() also accommodates aggregated data through the weights argument.
Abraham S, Sun L. Estimating Dynamic Treatment Effects in Event Studies with Heterogeneous Treatment Effects. MIT; 2018.
# Fit a staggered difference-in-differences model
sdid_hosp <- sdid(hospitalized ~ cohort + yr + age + sex + comorb,
df = hosp,
intervention_var = "intervention_yr")
summary(sdid_hosp)
Run the code above in your browser using DataLab