meanEquivTest: Equivalence Test for Pre-trends based on the Mean Placebo Coefficient

Description

This function performs an equivalence test for pre-trends based on the mean placebo coefficient from Dette & Schumann (2024).

Usage

meanEquivTest(
  Y,
  ID,
  G,
  period,
  X = NULL,
  data = NULL,
  equiv_threshold = NULL,
  pretreatment_period = NULL,
  base_period = NULL,
  vcov = NULL,
  cluster = NULL,
  alpha = 0.05
)

Value

An object of class "meanEquivTest" containing:

placebo_coefficients: a numeric vector of the estimated placebo coefficients,
abs_mean_placebo_coefs: the absolute value of the mean of the placebo coefficients,
var_mean_placebo_coef: the estimated variance of the mean placebo coefficient,
significance_level: the significance level of the test,
base_period: the base period used in the testing procedure,
num_individuals: the number of cross-sectional individuals in the panel used for testing,
num_periods: the number of periods in the panel used for testing (if the panel is unbalanced, num_periods represents the range in the number of time periods covered by different individuals),
num_observations: the total number of observations in the panel used for testing,
is_panel_balanced: a logical value indicating whether the panel is balanced,
equiv_threshold_specified: a logical value indicating whether an equivalence threshold was specified.

If equiv_threshold_specified = FALSE, then additionally minimum_equiv_threshold: the minimum equivalence threshold for which the null hypothesis of non-negligible (based on the equivalence threshold) trend-differences can be rejected.

If equiv_threshold_specified = TRUE, then additionally

mean_critical_value: the critical value at the alpha level,
p_value: the p-value of the test,
reject_null_hypothesis: A logical value indicating whether to reject the null hypothesis,
equiv_threshold: the equivalence threshold specified.

Arguments

Y: A numeric vector with the variable of interest. If data is supplied, Y should be a scalar indicating the column number or column-name character string that corresponds to the numeric dependent (outcome) variable in ’data’.
ID: A numeric vector identifying the different cross-sectional units in the dataset. If data is supplied, ID should be a scalar indicating the column number or column-name character string that corresponds to the cross-sectional units identifier in data.
G: A binary or logic vector (of the same dimension as Y and ID) indicating if the individual (e.g. as indicated by ID) receives treatment (e.g. 1 or TRUE) or not (0 or FALSE). f 'data' is supplied, G should be a scalar identifying the column number or column-name character string associated to G in data.
period: A numeric vector (of the same dimension as Y) indicating time. If data is supplied, period should be a scalar indicating the column number or column-name character string that corresponds to the time identifier in data.
X: A vector, matrix, or data.frame containing the control variables. If data is supplied, X must be a vector of column numbers or column-name character strings that identifies the control variables’ columns.
data: An optional data.frame object containing the variables in Y, ID, G, T and, if supplied, X and cluster as its columns.
equiv_threshold: The scalar equivalence threshold (must be positive). The default is NULL, implying that the function must look for the minimum value for which the null hypothesis of ”non-negligible differences” can still be rejected.
pretreatment_period: A numeric vector identifying the pre-treatment periods that should be used for testing. pretreatment_period must be a subset of the periods included through period. The default is to use all periods that are included in period.
base_period: The pre-treatment period to compare the post-treatment observation to. The default is to take the last period of the pre-treatment period.
vcov: The variance-covariance matrix that needs to be used. See Details for more details.
cluster: If vcov = "CL", a vector indicating which observations belong to the same cluster. cluster must be of the same length as the panel. If data is supplied, cluster must be either the column index or column name of this vector in the data.frame/matrix. The default (cluster=NULL) assumes every unit in ID is its own cluster.
alpha: Significance level of the test. The default is 0.05. Only required if equiv_threshold is not specified.

Author

Ties Bos

Details

The vcov parameter specifies the variance-covariance matrix to be used in the function. This parameter can take two types of inputs:

A character string specifying the type of variance-covariance matrix estimation. The options are:
- NULL: The default variance-covariance matrix estimated by the plm function is used.
- "HC": A heteroscedasticity-robust (HC) covariance matrix is estimated using the vcovHC function from the plm package, vcovHC, with type "HC1" and method "white1" (see White, 1980).
- "HAC": A heteroscedasticity and autocorrelation robust (HAC) covariance matrix is estimated using the vcovHC function from the plm package, vcovHC, with type "HC3" and method "arellano" (see Arellano, 1987).
- "CL": A cluster-robust covariance matrix is estimated using the vcovCR function from the clubSandwich package with type "CR0" (see Lian & Zegers (1986)). The cluster variable is either "ID" or a custom cluster variable provided in the data dataframe.
A function that takes an plm object as input and returns a variance-covariance matrix. This allows for custom variance-covariance matrix estimation methods. For example, you could use the vcovHC function from the sandwich package with a specific method and type:
```
function(x) {vcovHC(x, method = "white1", type = "HC2")}
```

If no vcov parameter is provided, the function defaults to using the variance-covariance matrix estimated by the plm::plm() function.

One should note that rows containing NA values are removed from the panel before the testing procedure is performed.

References

Arellano M (1987). “Computing Robust Standard Errors for Within-groups Estimators.” Oxford bulletin of Economics and Statistics, 49(4), 431–434.

Dette, H., & Schumann, M. (2024). "Testing for Equivalence of Pre-Trends in Difference-in-Differences Estimation." Journal of Business & Economic Statistics, 1–13. DOI: tools:::Rd_expr_doi("10.1080/07350015.2024.2308121")

Liang, K.-Y., & Zeger, S. L. (1986). "Longitudinal data analysis using generalized linear models." Biometrika, 73(1), 13-22. DOI: tools:::Rd_expr_doi("10.1093/biomet/73.1.13")

White H (1980). “A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity.” Econometrica, 48(4), 817–838.

Examples

Run this code

# Generate a balanced panel dataset with 500 cross-sectional units (individuals), 
# 5 time periods (labeled 1-5), a binary variable indicating which individual 
# receives treatment and 2 control variables ("X_1" and "X_2") 
# The error-terms are generated without heteroscedasticity, autocorrelation, 
# or any significant clusters. Furthermore, there are no fixed effects 
# and no pre-trends present in the data (all values in beta are 0). 
# See sim_paneldata() for more details.

sim_data <- sim_paneldata(N = 500, tt = 5, p = 2, beta = rep(0, 5), 
                          gamma = rep(1, 2), het = 0, phi = 0, sd = 1, 
                          burnins = 50)
                          
# Perform the test with equivalent threshold specified as 1 based on 
# pre-treatment periods 1-4 and assuming homoscedastic error-terms:
  # To select variables, one can use the column names / column numbers in the panel data:
  meanEquivTest(Y = "Y", ID = "ID", G = "G", period = 2, X = c(5, 6),
                data = sim_data, equiv_threshold = 1, pretreatment_period = 1:4,
                base_period = 4)
  # Alternatively, one can use separate variables:
  data_Y <- sim_data$Y
  data_ID <- sim_data$ID
  data_G <- sim_data$G
  data_period <- sim_data$period
  data_X <- sim_data[, c(5, 6)]
  meanEquivTest(Y = data_Y, ID = data_ID, G = data_G, period = data_period, X = data_X,
                equiv_threshold = 1, pretreatment_period = 1:4,
                base_period = 4)
   
# Perform the test with a heteroscedastic and autocorrelation robust 
# variance-covariance matrix estimator, and without specifying the equivalence threshold:
meanEquivTest(Y = "Y", ID = "ID", G = "G", period = "period", X = c(5, 6),
              data = sim_data, equiv_threshold = NULL, pretreatment_period = 1:4,
              base_period = 4, vcov = "HAC")

# Perform the test with an equivalence threshold of 1 and a custom
# variance-covariance matrix estimator:
vcov_func <- function(x) {plm::vcovHC(x, method = "white1", type = "HC2")}
meanEquivTest(Y = "Y", ID = "ID", G = "G", period = "period", 
              data = sim_data, equiv_threshold = 1, pretreatment_period = 1:4,
              base_period = 4, vcov = vcov_func)
               
# Perform the test using clustered standard errors based on a vector indicating 
# the cluster. For instance, two clusters with the following rule: all
# individuals with an ID below 250 are in the same cluster:
cluster_ind <- ifelse(sim_data$ID < 250, 1, 2)
meanEquivTest(Y = data_Y, ID = data_ID, G = data_G, period = data_period, X = data_X,
               equiv_threshold = 1, pretreatment_period = 1:4,
               base_period = 4, vcov = "CL", cluster = cluster_ind)

# Note that the testing procedure can also handle unbalanced panels. 
# Finally, one should note that the test procedure also works for unbalanced panels.
# To illustrate this, we generate an unbalanced panel dataset by randomly selecting
# 70% of the observations from the balanced panel dataset:
random_indeces <- sample(nrow(sim_data), 0.7*nrow(sim_data))
unbalanced_sim_data <- sim_data[random_indeces, ]
meanEquivTest(Y = "Y", ID = "ID", G = "G", period = "period", X = c(5, 6),
              data = unbalanced_sim_data, equiv_threshold = 1, pretreatment_period = 1:4,
              base_period = 4, vcov = "HAC")

Run the code above in your browser using DataLab