mfrd_est: Multivariate Frontier Regression Discontinuity Estimation

Description

mfrd_est implements the frontier approach for multivariate regression discontinuity estimation in Wong, Steiner and Cook (2013). It is based on the MFRDD code in Stata from Wong, Steiner, and Cook (2013).

Usage

mfrd_est(
  y,
  x1,
  x2,
  c1,
  c2,
  t.design = NULL,
  local = 0.15,
  front.bw = NA,
  m = 10,
  k = 5,
  kernel = "triangular",
  ngrid = 250,
  margin = 0.03,
  boot = NULL,
  cluster = NULL,
  stop.on.error = TRUE
)

Value

mfrd_est returns an object of class "mfrd". The functions summary and plot are used to obtain and print a summary and plot of the estimated regression discontinuity. The object of class mfrd is a list containing the following components:

w: Numeric vector specifying the weight of frontier 1 and frontier 2, respectively.
est: Numeric matrix of the estimate of the discontinuity in the outcome under a complete model (no prefix), heterogeneous treatment (ht) effects model, and treatment (t) only model, for the parametric case and for each corresponding bandwidth. Estimates with suffix "ev1" and "ev2" correspond to expected values for each frontier, under a given model. Estimates with suffix "ate" correspond to average treatment effects across both frontiers, under a given model.
d: Numeric matrix of the effect size (Cohen's d) for estimate.
se: Numeric matrix of the standard error for each corresponding bandwidth, if applicable.
m_s: A list containing estimates for the complete model, under parametric and non-parametric (optimal, half, and double bandwidth) cases. A list of coefficient estimates, residuals, effects, weights (in the non-parametric case), lm output (rank of the fitted linear model, fitted values, assignments for the design matrix, qr for linear fit, residual degrees of freedom, levels of the x value, function call, and terms), and output data frame are returned for each model.
m_h: A list containing estimates for the heterogeneous treatments model, under parametric and non-parametric (optimal, half, and double bandwidth) cases. A list of coefficient estimates, residuals, effects, weights (in the non-parametric case), lm output (rank of the fitted linear model, fitted values, assignments for the design matrix, qr for linear fit, residual degrees of freedom, levels of the x value, function call, and terms), and output data frame are returned for each model.
m_t: A list containing estimates for the treatment only model, under parametric and non-parametric (optimal, half, and double bandwidth) cases. A list of coefficient estimates, residuals, effects, weights (in the non-parametric case), lm output (rank of the fitted linear model, fitted values, assignments for the design matrix, qr for linear fit, residual degrees of freedom, levels of the x value, function call, and terms), and output data frame are returned for each model.
dat_h: A list containing four data frames, one for each case: parametric or non-parametric (optimal, half, and double bandwidth). Each data frame contains functions and densities for each frontier and treatment model.
dat: A data frame containing the outcome (y) and each input (x1, x2) for each observation. The data frame also contains indicators of being within the local boundary of the cutpoint for x1 and x2 (x1res, x2res), scaled (zx1, zx2) and centered x1 and x2 values (zcx1, zcx2), and treatment indicators for overall treatment (tr) based on treatment assignment from x1 (tr1), x2 (tr2), and both assignment variables (trb).
obs: List of the number of observations used in each model.
impute: A logical value indicating whether multiple imputation is used or not.
call: The matched call.
front.bw: Numeric vector of each bandwidth used to estimate the density at the frontier for the three effects models (complete model, heterogeneous treatment model, and treatment only model) detailed in Wong, Steiner, and Cook (2013).

Arguments

y: A numeric object containing outcome variable.
x1: A numeric object containing the first assignment variable.
x2: A numeric object containing the second assignment variable.
c1: A numeric value containing the cutpoint at which assignment to the treatment is determined for x1.
c2: A numeric value containing the cutpoint at which assignment to the treatment is determined for x2.
t.design: A character vector of length 2 specifying the treatment option according to design. The first entry is for x1 and the second entry is for x2. Options are "g" (treatment is assigned if x1 is greater than its cutoff), "geq" (treatment is assigned if x1 is greater than or equal to its cutoff), "l" (treatment is assigned if x1 is less than its cutoff), and "leq" (treatment is assigned if x1 is less than or equal to its cutoff). The same options are available for x2.
local: A non-negative numeric value specifying the range of neighboring points around the cutoff on the standardized scale, for each assignment variable. The default is 0.15.
front.bw: A non-negative numeric vector of length 3 specifying the bandwidths at which to estimate the RD for each of three effects models (complete model, heterogeneous treatment model, and treatment only model) detailed in Wong, Steiner, and Cook (2013). If NA, front.bw will be determined by cross-validation. The default is NA.
m: A non-negative integer specifying the number of uniformly-at-random samples to draw as search candidates for front.bw, if front.bw is NA. The default is 10.
k: A non-negative integer specifying the number of folds for cross-validation to determine front.bw, if front.bw is NA. The default is 5.
kernel: A string indicating which kernel to use. Options are "triangular" (default and recommended), "rectangular", "epanechnikov", "quartic", "triweight", "tricube", and "cosine".
ngrid: A non-negative integer specifying the number of non-zero grid points on each assignment variable, which is also the number of zero grid points on each assignment variable. The default is 250. The value used in Wong, Steiner and Cook (2013) is 2500, which may cause long computational time.
margin: A non-negative numeric value specifying the range of grid points beyond the minimum and maximum of sample points on each assignment variable. This grid is used to impute potential outcomes along the frontier, as in Wong, Steiner, and Cook (2013). The default is 0.03.
boot: An optional non-negative integer specifying the number of bootstrap samples to obtain standard error of estimates.
cluster: An optional vector of length n specifying clusters within which the errors are assumed to be correlated. This will result in reporting cluster robust SEs. It is suggested that data with a discrete running variable be clustered by each unique value of the running variable (Lee and Card, 2008).
stop.on.error: A logical value indicating whether to remove bootstraps which cause error in the integrate function. If TRUE, bootstraps which cause error are removed and resampled until the specified number of bootstrap samples are acquired. If FALSE, bootstraps which cause error are not removed. The default is TRUE.

References

Wong, V., Steiner, P, and Cook, T. (2013). Analyzing regression discontinuity designs with multiple assignment variables: A comparative study of four estimation methods. Journal of Educational and Behavioral Statistics, 38(2), 107-141. tools:::Rd_expr_doi("10.3102/1076998611432172").

Lee, D. and Card, D. (2008). A Regression discontinuity inference with specification error. Journal of Econometrics, 142(2), 655-674. tools:::Rd_expr_doi("10.1016/j.jeconom.2007.05.003").

Examples

Run this code

set.seed(12345)
x1 <- runif(1000, -1, 1)
x2 <- runif(1000, -1, 1)
cov <- rnorm(1000)
y <- 3 + 2 * (x1 >= 0) + 3 * cov + 10 * (x2 >= 0) + rnorm(1000)
mfrd_est(y = y, x1 = x1, x2 = x2, c1 = 0, c2 = 0, t.design = c("geq", "geq"))

Run the code above in your browser using DataLab