maxEquivTest: Equivalence Test for Pre-trends based on the Maximum Absolute Placebo Coefficient

Description

This function performs an equivalence test for pre-trends based on the maximum absolute placebo coefficient from Dette & Schumann (2024). The test can be performed using the intersection-union approach (IU), a bootstrap procedure for spherical errors (Boot) and a wild bootstrap procedure (Wild).

Usage

maxEquivTest(
  Y,
  ID,
  G,
  period,
  X = NULL,
  data = NULL,
  equiv_threshold = NULL,
  pretreatment_period = NULL,
  base_period = NULL,
  type = c("IU", "Boot", "Wild"),
  vcov = NULL,
  cluster = NULL,
  alpha = 0.05,
  B = 1000
)

Value

If type = "IU", an object of class maxEquivTestIU with

placebo_coefficients: A numeric vector of the estimated placebo coefficients,
abs_placebo_coefficients: a numeric vector with the absolute values of estimated placebo coefficients,
placebo_coefficients_se: a numeric vector with the standard errors of the placebo coefficients,
significance_level: the chosen significance level of the test,
base_period: the base period used in the testing procedure,
placebo_names: the names corresponding to the placebo coefficients,
num_individuals: the number of cross-sectional individuals in the panel used for testing,
num_periods: the number of periods in the panel used for testing (if the panel is unbalanced, num_periods indicates the range of time periods across all individuals),
num_observations: the total number of observations in the panel used for testing,
is_panel_balanced: a logical value indicating whether the panel is balanced,
equiv_threshold_specified: a logical value indicating whether an equivalence threshold was specified.
if equiv_threshold_specified = TRUE, then additionally
- IU_critical_values: a numeric vector with the individual critical values for each of the placebo coefficients,
- reject_null_hypothesis: a logical value indicating whether the null hypothesis of negligible pre-trend differences can be rejected at the specified significance level alpha,
- equiv_threshold: the equivalence threshold employed.
if equiv_threshold_specified = FALSE, then additionally
- minimum_equiv_thresholds: a numeric vector including for each placebo coefficient the minimum equivalence threshold for which the null hypothesis of negligible pre-trend differences can be rejected for the corresponding placebo coefficient individually,
- minimum_equiv_threshold: a numeric scalar minimum equivalence threshold for which the null hypothesis of negligible pre-trend differences can be rejected for all placebo coefficients individually.

if type = "Boot" or type = "Wild", an object of class "maxEquivTestBoot" with

placebo_coefficients: a numeric vector of the estimated placebo coefficients,
abs_placebo_coefficients: a numeric vector with the absolute values of estimated placebo coefficients,
max_abs_coefficient: the maximum absolute estimated placebo coefficient,
B: the number of bootstrap samples used to find the critical value,
significance_level: the chosen significance level of the test alpha,
base_period: the base period used in the testing procedure,
placebo_names: the names corresponding to the placebo coefficients,
equiv_threshold_specified: a logical value indicating whether an equivalence threshold was specified.
num_individuals: the number of cross-sectional individuals in the panel used for testing,
num_periods: the number of pre-treatment periods in the panel used for testing (if the panel is unbalanced, num_periods represents the range in the number of time periods covered by different individuals),
num_observations: the total number of observations in the panel used for testing,
is_panel_balanced: a logical value indicating whether the panel is balanced.
if equiv_threshold_specified = TRUE, then additionally
- bootstrap_critical_value: the by bootstrap found critical value for the equivalence test based on the maximum absolute placebo coefficient,
- reject_null_hypothesis: a logical value indicating whether the null hypothesis of negligible pre-trend differences can be rejected at the specified significance level alpha,
if equiv_threshold_specified = FALSE, then additionally
- minimum_equiv_threshold: a numeric scalar minimum equivalence threshold for which the null hypothesis of negligible pre-trend differences can be rejected for the bootstrap procedure.

Arguments

Y: A numeric vector with the variable of interest. If data is supplied, Y should be a scalar indicating the column number or column-name character string that corresponds to the numeric dependent (outcome) variable in ’data’.
ID: A numeric vector identifying the different cross-sectional units in the dataset. If data is supplied, ID should be a scalar indicating the column number or column-name character string that corresponds to the cross-sectional units identifier in data.
G: A binary or logic vector (of the same dimension as Y and ID) indicating if the individual (e.g. as indicated by ID) receives treatment (e.g. 1 or TRUE) or not (0 or FALSE). If 'data' is supplied, G should be a scalar identifying the column number or column-name character string associated to G in data.
period: A numeric vector (of the same dimension as Y) indicating time. If data is supplied, period should be a scalar indicating the column number or column-name character string that corresponds to the time identifier in data.
X: A vector, matrix, or data.frame containing the control variables. If data is supplied, X must be a vector of column numbers or column-name character strings that identifies the control variables’ columns.
data: An optional data.frame object containing the variables in Y, ID, G, T and, if supplied, X and cluster as its columns.
equiv_threshold: The scalar equivalence threshold (must be positive). The default is NULL, implying that the function must look for the minimum value for which the null hypothesis of ”non-negligible differences” can still be rejected.
pretreatment_period: A numeric vector identifying the pre-treatment periods that should be used for testing. pretreatment_period must be a subset of the periods included through period. The default is to use all periods that are included in period.
base_period: The pre-treatment period to compare the post-treatment observation to. The default is to take the last period of the pre-treatment period.
type: The type of maximum test that should be performed. "IU" for the intersection-union test, "Boot" for the regular bootstrap procedure from Dette & Schumann (2024) and "Wild" for the Wild bootstrap procedure.
vcov: If type = "IU", the variance-covariance matrix that needs to be used. See Details for more details.
cluster: If vcov = "CL", a vector indicating which observations belong to the same cluster. cluster must be of the same length as the panel. If data is supplied, cluster must be either the column index or column name of this vector in the data.frame/matrix. The default (cluster=NULL) assumes every unit in ID is its own cluster. Only required if vcov = "CL" and type = "IU".
alpha: Significance level of the test. The default is 0.05. Only required if equiv_threshold is not specified.
B: If type = Boot or type = Wild, the number of bootstrap samples used. The default is 1000.

Author

Ties Bos

Details

The vcov parameter specifies the variance-covariance matrix to be used in the function for type = "IU". This parameter can take two types of inputs:

A character string specifying the type of variance-covariance matrix estimation. The options are:
- NULL: The default variance-covariance matrix estimated by the plm function is used.
- "HC": A heteroscedasticity-robust (HC) covariance matrix is estimated using the vcovHC function from the plm package, vcovHC, with type "HC1" and method "white1" (see White, 1980).
- "HAC": A heteroscedasticity and autocorrelation robust (HAC) covariance matrix is estimated using the vcovHC function from the plm package, vcovHC, with type "HC3" and method "arellano" (see Arellano, 1987).
- "CL": A cluster-robust covariance matrix is estimated using the vcovCR function from the clubSandwich package with type "CR0" (see Lian & Zegers (1986)). The cluster variable is either "ID" or a custom cluster variable provided in the data dataframe.
A function that takes an plm object as input and returns a variance-covariance matrix. This allows for custom variance-covariance matrix estimation methods. For example, you could use the vcovHC function from the sandwich package with a specific method and type:
```
function(x) {vcovHC(x, method = "white1", type = "HC2")}
```

If no vcov parameter is provided, the function defaults to using the variance-covariance matrix estimated by the plm::plm() function.

One should note that rows containing NA values are removed from the panel before the testing procedure is performed.

NOTE: Please be aware that including control variables (X) might lead to higher computation times for type = "Boot" and type = "Wild", due to unconstrained parameters in the optimization problem that estimates the constrained placebo coefficients.

On top of that, please be aware that the bootstrap procedures for the equivalence test based on the maximum absolute placebo coefficient apply a bootstrap procedure (as described by Dette & Schumann (2024)), leading to a stochastic critical value and minimum equivalence threshold. Therefore, the results may vary slightly between different runs of the function. For reproducibility of the bootstrap procedures, it is recommended to set a seed before using the function.

References

Arellano M (1987). “Computing Robust Standard Errors for Within-groups Estimators.” Oxford bulletin of Economics and Statistics, 49(4), 431–434.

Dette, H., & Schumann, M. (2024). "Testing for Equivalence of Pre-Trends in Difference-in-Differences Estimation." Journal of Business & Economic Statistics, 1–13. DOI: tools:::Rd_expr_doi("10.1080/07350015.2024.2308121")

Liang, K.-Y., & Zeger, S. L. (1986). "Longitudinal data analysis using generalized linear models." Biometrika, 73(1), 13-22. doi:10.1093/biomet/73.1.13

White H (1980). “A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity.” Econometrica, 48(4), 817–838.

Examples

Run this code

# Generate a balanced panel dataset with 500 cross-sectional units (individuals), 
# 5 time periods (labeled 1-5), a binary variable indicating which individual 
# receives treatment and 2 control variables ("X_1" and "X_2") The error-terms are generated without 
# heteroscedasticity,  autocorrelation, or any significant clusters. 
# Furthermore, there are no fixed effects (lambda and eta are both vectors 
# containing only 0) and no pre-trends present in the data (all values in 
# beta are 0). See sim_paneldata() for more details.

sim_data <- sim_paneldata(N = 500, tt = 5, p = 2, beta = rep(0, 5), 
                          gamma = rep(1, 2), het = 0, phi = 0, sd = 1, 
                          burnins = 50)

# -----------------  IU Approach -----------------
# Perform the test with equivalent threshold specified as 1 based on 
# pre-treatment periods 1-4 and homoscedastic error-terms:
  # To select variables, one can use the column names / numbers in the panel data
maxEquivTest(Y = "Y", ID = "ID", G = "G", period = 2, X= c(5,6),
              data = sim_data, equiv_threshold = 1, pretreatment_period = 1:4,
              base_period = 4, type = "IU")
  # Alternatively, one can enter the variables separately:
data_Y <- sim_data$Y
data_ID <- sim_data$ID
data_G <- sim_data$G
data_period <- sim_data$period
data_X <- sim_data[, c(5, 6)]
maxEquivTest(Y = data_Y, ID = data_ID, G = data_G, period = data_period, X = data_X,
             equiv_threshold = 1, pretreatment_period = 1:4,
             base_period = 4, type = "IU")
             
# Perform the test without specifying the equivalence threshold with heteroscedastic 
# and autocorrelation robust variance-covariance matrix estimator:
maxEquivTest(Y = 3, ID = 1, G = 4, period = 2, 
             data = sim_data, equiv_threshold = NULL, pretreatment_period = 1:4,
             base_period = 4, type = "IU", vcov = "HAC")

# Perform the test without specifying the equivalence threshold with a custom
# variance-covariance matrix estimator:
vcov_func <- function(x) {plm::vcovHC(x, method = "white1", type = "HC2")}
maxEquivTest(Y = "Y", ID = "ID", G = "G", period = "period", 
             data = sim_data, equiv_threshold = 1, pretreatment_period = 1:4,
             base_period = 4, type = "IU", vcov = vcov_func)
 
# Perform the test using clustered standard errors based on a vector indicating 
# the cluster. For instance, two clusters with the following rule: all
# individuals with an ID below 250 are in the same cluster.
cluster_ind <- ifelse(sim_data$ID < 250, 1, 2)
maxEquivTest(Y = data_Y, ID = data_ID, G = data_G, period = data_period, X = data_X,
               equiv_threshold = 1, pretreatment_period = 1:4,
               base_period = 4, type = "IU", vcov = "CL", cluster = cluster_ind)

# Note that the testing procedure can also handle unbalanced panels. 
# Finally, one should note that the test procedure also works for unbalanced panels.
# To illustrate this, we generate an unbalanced panel dataset by randomly selecting
# 70% of the observations from the balanced panel dataset:
random_indeces <- sample(nrow(sim_data), 0.7*nrow(sim_data))
unbalanced_sim_data <- sim_data[random_indeces, ]
maxEquivTest(Y = "Y", ID = "ID", G = "G", period = "period", X = c(5, 6),
              data = unbalanced_sim_data, equiv_threshold = 1, pretreatment_period = 1:4,
              base_period = 4, type = "IU", vcov = "HAC")

#-----------------  Bootstrap Approach -----------------
 # \donttest{
 # Perform the test with equivalence threshold specified as 1 based on 
 # pre-treatment periods 1:4 (with base period 4) with the general bootstrap procedure:
 maxEquivTest(Y = "Y", ID = "ID", G = "G", period = "period", 
             data = sim_data, equiv_threshold = 1, pretreatment_period = 1:4,
             base_period = 4, type = "Boot")

 # Perform the test with the equivalence threshold specified as 1 based on 
 # pre-treatment periods 1:4 (with base period 4) with the wild bootstrap procedure:
 maxEquivTest(Y = "Y", ID = "ID", G = "G", period = "period", 
             data = sim_data, equiv_threshold = 1, pretreatment_period = 1:4,
             base_period = 4, type = "Wild")
 
 # The bootstrap procedures can handle unbalanced panels:
 maxEquivTest(Y = "Y", ID = "ID", G = "G", period = "period", 
             data = unbalanced_sim_data, equiv_threshold = 1, 
             pretreatment_period = 1:4,
             base_period = 4, type = "Boot")
 maxEquivTest(Y = "Y", ID = "ID", G = "G", period = "period", 
             data = unbalanced_sim_data, equiv_threshold = 1, 
             pretreatment_period = 1:4,
             base_period = 4, type = "Wild") 
 
 # Performing the test without specifying the equivalence threshold:
 maxEquivTest(Y = "Y", ID = "ID", G = "G", period = "period", 
             data = sim_data, equiv_threshold = NULL, pretreatment_period = 1:4,
             base_period = 4, type = "Boot")

 maxEquivTest(Y = "Y", ID = "ID", G = "G", period = "period", 
             data = sim_data, equiv_threshold = NULL, pretreatment_period = 1:4,
             base_period = 4, type = "Wild")           
# }

Run the code above in your browser using DataLab

State of Data and AI Literacy Report 2025