RDHonest: Honest inference in RD

Description

Calculate estimators and bias-aware CIs for the sharp or fuzzy RD parameter, or for value of the conditional mean at a point.

Usage

RDHonest(
  formula,
  data,
  subset,
  weights,
  cutoff = 0,
  M,
  kern = "triangular",
  na.action,
  opt.criterion = "MSE",
  h,
  se.method = "nn",
  alpha = 0.05,
  beta = 0.8,
  J = 3,
  sclass = "H",
  T0 = 0,
  point.inference = FALSE,
  sigmaY2,
  sigmaD2,
  sigmaYD,
  clusterid
)

Value

Returns an object of class "RDResults". The function

print can be used to obtain and print a summary of the results. An object of class "RDResults" is a list containing four components. First, a data frame "coefficients" containing the following columns:

term: type of parameter being estimated
estimate: point estimate
std.error: standard error of estimate
maximum.bias: maximum bias of estimate
conf.low, conf.high: lower (upper) end-point of a two-sided CI based on estimate
conf.low.onesided, conf.high.onesided: lower (upper) end-point of a one-sided CIs based on estimate
bandwidth: bandwidth used. If kern="optimal", the smoothing parameters bandwidth.m and bandwidth.p on either side of the cutoff are reported instead
eff.obs: number of effective observations
leverage: maximal leverage of estimate
cv: critical value used to compute two-sided CIs
alpha: coverage level, as specified by option alpha
method: sclass is used
M: curvature bound used for worst-case bias calculations. For fuzzy RD, equals (abs(estimate)*M.fs+M.rf)/first.stage
M.rf, M.fs: curvature bound for the outcome (i.e. reduced-form) and first-stage regressions. Fuzzy RD only.
first.stage: estimate of the first-stage coefficient. Fuzzy RD only.
kernel: kernel used
p.value: p-value for testing the null of no effect

Second, a list called "data" containing the data used for estimation. This is useful mostly for internal calculations. Third, an object of class "lm" containing the local linear regression estimates. Finally, a call object containing the matched call called "call".

If kern="optimal", the "lm" object is empty, and the numeric vectors "delta" and "omega" are returned in addition. These correspond to the parameters in the modulus problem used to compute the optimal estimation weights.

Arguments

formula

an object of class "formula" (or one that can be coerced to that class). The formula syntax is outcome ~ running_variable for inference at a point. For sharp RD, it is outcome ~ running_variable if there are no covariates, or outcome ~ running_variable | covariates if covariates are present. For fuzzy RD, it is outcome | treatment ~ running_variable | covariates, with covariates optional.

data

optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the outcome and running variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which the function is called.

subset

optional vector specifying a subset of observations to be used in the fitting process.

weights

Optional vector of weights to weight the observations (useful for aggregated data). The weights are interpreted as the number of observations that each aggregated data point averages over. Disregarded if optimal kernel is used.

cutoff

specifies the RD cutoff in the running variable. For inference at a point, specifies the point \(x_0\) at which to calculate the conditional mean.

M

Bound on second derivative of the conditional mean function, a numeric vector of length one. For fuzzy RD, M needs to be a numeric vector of length two, specifying the smoothness of the conditional mean for the outcome and treatment, respectively.

kern

specifies the kernel function used in the local regression. It can either be a string equal to "triangular" (\(k(u)=(1-|u|)_{+}\)), "epanechnikov" (\(k(u)=(3/4)(1-u^2)_{+}\)), or "uniform" (\(k(u)= (|u|<1)/2\)), or else a kernel function. If equal to "optimal", use the finite-sample optimal linear estimator under Taylor smoothness class, instead of a local linear estimator.

na.action

function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options (usually na.omit). Another possible value is na.fail

opt.criterion

Optimality criterion that the bandwidth is designed to optimize. The options are:

"MSE"

Finite-sample maximum MSE

"FLCI"

Length of (fixed-length) two-sided confidence intervals.

"OCI"

Given quantile of excess length of one-sided confidence intervals

The methods use conditional variance given by sigmaY2, if supplied. For fuzzy RD, sigmaD2 and sigmaYD also need to be supplied in this case. Otherwise, the methods use preliminary variance estimates based on assuming homoskedasticity on either side of the cutoff.

bandwidth, a scalar parameter. If not supplied, optimal bandwidth is computed according to criterion given by opt.criterion.

se.method

method for estimating standard error of the estimate, one of:

"nn": Nearest neighbor method

"EHW"

Eicker-Huber-White, with residuals from local regression (local polynomial estimators only).

"supplied.var"

Use conditional variance supplied by sigmaY2 instead of computing residuals. For fuzzy RD, sigmaD2 and sigmaYD also need to be supplied in this case.

alpha

determines confidence level, 1-alpha for constructing/optimizing confidence intervals.

beta

Determines quantile of excess length to optimize, if bandwidth optimizes given quantile of excess length of one-sided confidence intervals (opt.criterion="OCI"); otherwise ignored.

Number of nearest neighbors, if se.method="nn" is specified. Otherwise ignored.

sclass

Smoothness class, either "T" for Taylor or "H" for Hölder class.

Initial estimate of the treatment effect for calculating the optimal bandwidth. Only relevant for fuzzy RD.

point.inference

Do inference at a point determined by cutoff instead of RD.

sigmaY2

Supply variance of outcome. Ignored when kernel is optimal.

sigmaD2

Supply variance of treatment (fuzzy RD only).

sigmaYD

Supply covariance of treatment and outcome (fuzzy RD only).

clusterid

Vector specifying cluster membership. If supplied, se.method="EHW" is required, and standard errors use cluster-robust variance formulas.

Details

The bandwidth is calculated to be optimal for a given performance criterion, as specified by opt.criterion. Alternatively, for local polynomial estimators, the bandwidth can be specified by h. For kern="optimal", calculate optimal estimators under second-order Taylor smoothness class (sharp RD only).

References

Timothy B. Armstrong and Michal Kolesár. Optimal inference in a class of regression models. Econometrica, 86(2):655–683, March 2018. tools:::Rd_expr_doi("10.3982/ECTA14434")Timothy B. Armstrong and Michal Kolesár. Simple and honest confidence intervals in nonparametric regression. Quantitative Economics, 11(1):1–39, January 2020. tools:::Rd_expr_doi("10.3982/QE1199")Michal Kolesár and Christoph Rothe. Inference in regression discontinuity designs with a discrete running variable. American Economic Review, 108(8):2277—-2304, August 2018. tools:::Rd_expr_doi("10.1257/aer.20160945")

Examples

Run this code

RDHonest(voteshare ~ margin, data = lee08, kern = "uniform", M = 0.1, h = 10)
RDHonest(cn | retired ~ elig_year, data=rcp, cutoff=0, M=c(4, 0.4),
          kern="triangular", opt.criterion="MSE", T0=0, h=3)
RDHonest(voteshare ~ margin, data = lee08, subset = margin>0,
          kern = "uniform", M = 0.1, h = 10, point.inference=TRUE)

Run the code above in your browser using DataLab