rmsEquivTest: Equivalence Test for Pre-trends based on the RMS Placebo Coefficient

Description

This function performs an equivalence test for pre-trends based on the root mean squared placebo coefficient from Dette & Schumann (2024).

Usage

rmsEquivTest(
  Y,
  ID,
  G,
  period,
  X = NULL,
  data = NULL,
  equiv_threshold = NULL,
  pretreatment_period = NULL,
  base_period = NULL,
  alpha = 0.05,
  no_lambda = 5
)

Value

An object of class "rmsEquivTest" containing:

placebo_coefficients: A numeric vector of the estimated placebo coefficients,
rms_placebo_coefs: the root mean squared value of the placebo coefficients,
significance_level: the significance level of the test,
base_period: the base period used in the testing procedure,
num_individuals: the number of cross-sectional individuals in the panel used for testing,
num_periods: the number of pre-treatment periods in the panel used for testing (if the panel is unbalanced, num_periods represents the range in the number of time periods covered by different individuals),
num_observations: the total number of observations in the panel used for testing,
is_panel_balanced: a logical value indicating whether the panel is balanced,
equiv_threshold_specified: a logical value indicating whether an equivalence threshold was specified.

If equiv_threshold_specified = FALSE, then additionally minimum_equiv_threshold: the minimum equivalence threshold for which the null hypothesis of non-negligible (based on the equivalence threshold) trend-differences can be rejected.

If equiv_threshold_specified = TRUE, then additionally

rms_critical_value: the critical value at the alpha level,
reject_null_hypothesis: A logical value indicating whether to reject the null hypothesis,
equiv_threshold: the equivalence threshold specified.

Arguments

Y: A numeric vector with the variable of interest. If data is supplied, Y should be a scalar indicating the column number or column-name character string that corresponds to the numeric dependent (outcome) variable in ’data’.
ID: A numeric vector identifying the different cross-sectional units in the dataset. If data is supplied, ID should be a scalar indicating the column number or column-name character string that corresponds to the cross-sectional units identifier in data.
G: A binary or logic vector (of the same dimension as Y and ID) indicating if the individual (e.g. as indicated by ID) receives treatment (e.g. 1 or TRUE) or not (0 or FALSE). f 'data' is supplied, G should be a scalar identifying the column number or column-name character string associated to G in data.
period: A numeric vector (of the same dimension as Y) indicating time. If data is supplied, period should be a scalar indicating the column number or column-name character string that corresponds to the time identifier in data.
X: A vector, matrix, or data.frame containing the control variables. If data is supplied, X must be a vector of column numbers or column-name character strings that identifies the control variables’ columns.
data: An optional data.frame object containing the variables in Y, ID, G, T and, if supplied, X and cluster as its columns.
equiv_threshold: The scalar equivalence threshold (must be positive). The default is NULL, implying that the function must look for the minimum value for which the null hypothesis of ”non-negligible differences” can still be rejected.
pretreatment_period: A numeric vector identifying the pre-treatment periods that should be used for testing. pretreatment_period must be a subset of the periods included through period. The default is to use all periods that are included in period.
base_period: The pre-treatment period to compare the post-treatment observation to. The default is to take the last period of the pre-treatment period.
alpha: Significance level of the test. The default is 0.05.
no_lambda: Parameter specifying the number of incremental segments of the dataset over which a statistic is calculated. See Details. The default is 5.

Author

Ties Bos

Details

no_lambda determines the proportions lambda/no.lambda for lambda = 1,...,no_lambda of the cross-sectional units at which the placebo coefficients are estimated. The placebo coefficients are estimated for each of these proportions and the root mean squared (RMS) of the placebo coefficients is calculated, which are then used to construct the critical value at a significance level of alpha. See Dette & Schumann (2024, s. 4.2.3.) for more details.

One should note that rows containing NA values are removed from the panel before the testing procedure is performed.

Please be aware that the equivalence test based on the root mean squared placebo coefficient uses a randomization technique (as described by Dette & Schumann (2024)), leading to a stochastic critical value and minimum equivalence threshold. Therefore, the results may vary slightly between different runs of the function. For reproducibility, it is recommended to set a seed before using the function.

References

Dette, H., & Schumann, M. (2024). "Testing for Equivalence of Pre-Trends in Difference-in-Differences Estimation." Journal of Business & Economic Statistics, 1–13. DOI: tools:::Rd_expr_doi("10.1080/07350015.2024.2308121")

Examples

Run this code

# Generate a balanced panel dataset with 500 cross-sectional units (individuals), 
# 5 time periods (labeled 1-5), a binary variable indicating which individual 
# receives treatment and 2 control variables ("X_1" and "X_2"). 
# The error-terms are generated without  heteroscedasticity,  autocorrelation, 
# or any significant clusters. Furthermore, there are no fixed effects and 
# no pre-trends present in the data (all values in  beta are 0). 
# See sim_paneldata() for more details.

sim_data <- sim_paneldata(N = 500, tt = 5, p = 2, beta = rep(0, 5), 
                          gamma = rep(1, 2), het = 0, phi = 0, sd = 1, 
                          burnins = 50)

# Perform the equivalence test using an equivalence threshold of 1 with periods 
# 1-4 as pre-treatment periods based on the RMS testing procedure:
#  - option 1: using column names in the panel
# One can use the names of the columns in the panel to specify the variables:
rmsEquivTest(Y = "Y", ID = "ID", G = "G", period = "period", X = c("X_1", "X_2"),
             data = sim_data, equiv_threshold = 1, pretreatment_period = 1:4,
             base_period = 4)

#  - option 2: using column numbers in the panel 
# Alternatively, one can use the column numbers in the panel to specify the variables:
rmsEquivTest(Y = 3, ID = 1, G = 4, period = 2, X = c(5, 6),
             data = sim_data, equiv_threshold = 1, pretreatment_period = 1:4,
             base_period = 4)
             
#  - option 3: using separate variables 
# One can also use the variables directly without specifying the data variable:
data_Y <- sim_data$Y
data_ID <- sim_data$ID
data_G <- sim_data$G
data_period <- sim_data$period
data_X <- cbind(sim_data$X_1, sim_data$X_2)

rmsEquivTest(Y = data_Y, ID = data_ID, G = data_G, period = data_period, X = data_X,
             equiv_threshold = 1, pretreatment_period = 1:4,
             base_period = 4)

# The testing procedures can also be performed without specifying the 
# equivalence threshold specified. Then, the minimum equivalence threshold is returned
# for which the null hypothesis of non-negligible trend-differences can be rejected.
# Again, the three possible ways of entering the data as above can be used:
rmsEquivTest(Y = "Y", ID = "ID", G = "G", period = "period", X = c("X_1", "X_2"),
             data = sim_data, equiv_threshold = NULL, pretreatment_period = 1:4,
             base_period = 4)

rmsEquivTest(Y = 3, ID = 1, G = 4, period = 2, X = c(5, 6),
             data = sim_data, equiv_threshold = NULL, pretreatment_period = 1:4,
             base_period = 4)
             
rmsEquivTest(Y = data_Y, ID = data_ID, G = data_G, period = data_period, X= data_X,
             equiv_threshold = NULL, pretreatment_period = 1:4,
             base_period = 4)

# Finally, one should note that the test procedure also works for unbalanced panels.
# To illustrate this, we generate an unbalanced panel dataset by randomly selecting
# 70% of the observations from the balanced panel dataset:

random_indeces <- sample(nrow(sim_data), 0.7*nrow(sim_data))
unbalanced_sim_data <- sim_data[random_indeces, ]
#  With Equivalence Threshold:
rmsEquivTest(Y = 3, ID = 1, G = 4, period = 2, X = c(5, 6),
             data = unbalanced_sim_data, equiv_threshold = 1, 
             pretreatment_period = 1:4, base_period = 4)

#  Without Equivalence Threshold:
rmsEquivTest(Y = 3, ID = 1, G = 4, period = 2, X = c(5, 6),
             data = unbalanced_sim_data, equiv_threshold = NULL, 
             pretreatment_period = 1:4, base_period = 4)

Run the code above in your browser using DataLab