rd_est: Regression Discontinuity Estimation

Description

rd_est estimates both sharp and fuzzy RDD, using parametric and non-parametric (local linear) models. It is based on the RDestimate function in the rdd package. Sharp RDDs (both parametric and non-parametric) are estimated using lm in the stats package. Fuzzy RDDs (both parametric and non-parametric) are estimated using two-stage least-squares ivreg in the AER package. For non-parametric models, Imbens-Kalyanaraman optimal bandwidths can be used,

Usage

rd_est(formula, data, subset = NULL, cutpoint = NULL, bw = NULL,
  kernel = "triangular", se.type = "HC1", cluster = NULL,
  verbose = FALSE, less = FALSE, est.cov = FALSE, est.itt = FALSE,
  t.design = NULL)

Arguments

formula

The formula of the RDD. This is supplied in the format of y ~ x for a simple sharp RDD, or y ~ x | c1 + c2 for a sharp RDD with two covariates. Fuzzy RDD may be specified as y ~ x + z where x is the running variable, and z is the endogenous treatment variable. Covariates are then included in the same manner as in a sharp RDD.

data

An optional data frame.

subset

An optional vector specifying a subset of observations to be used

cutpoint

The cutpoint. If omitted, it is assumed to be 0.

A numeric vector specifying the bandwidths at which to estimate the RD. If omitted or it is "IK12", the bandwidth is calculated using the Imbens-Kalyanaraman 2012 method. If it is "IK09", the bandwidth is calculated using the Imbens-Kalyanaraman 2009 method. Then it is estimated with that bandwidth, half that bandwidth, and twice that bandwidth. If only a single value is passed into the function, the RD will similarly be estimated at that bandwidth, half that bandwidth, and twice that bandwidth.

kernel

A string specifying the kernel to be used in the local linear fitting. "triangular" kernel is the default and is the "correct" theoretical kernel to be used for edge estimation as in RDD (Lee and Lemieux, 2010). Other options are "rectangular", "epanechnikov", "quartic", "triweight", "tricube", "gaussian" and "cosine".

se.type

This specifies the robust SE calculation method to use. Options are, as in vcovHC, "HC3", "const", "HC", "HC0", "HC1", "HC2", "HC4", "HC4m", "HC5". This option is overridden by cluster.

cluster

An optional vector specifying clusters within which the errors are assumed to be correlated. This will result in reporting cluster robust SEs. This option overrides anything specified in se.type. It is suggested that data with a discrete running variable be clustered by each unique value of the running variable (Lee and Card, 2008).

verbose

Will provide some additional information printed to the terminal.

less

Logical. If TRUE, return the estimates of linear and optimal, instead of linear, quadratic, cubic, optimal, half and double.

est.cov

Logical. If TRUE, the estimates of covariates will be included.

est.itt

Logical. If TRUE, the estimates of ITT will be returned.

t.design

The treatment option according to design. The entry is for X: "g" means treatment is assigned if X is greater than its cutoff, "geq" means treatment is assigned if X is greater than or equal to its cutoff, "l" means treatment is assigned if X is less than its cutoff, "leq" means treatment is assigned if X is less than or equal to its cutoff.

Value

rd_est returns an object of class "rd". The functions summary and plot are used to obtain and print a summary and plot of the estimated regression discontinuity. The object of class rd is a list containing the following components:

type

A string denoting either "sharp" or "fuzzy" RDD.

est

Numeric vector of the estimate of the discontinuity in the outcome under a sharp design, or the Wald estimator in the fuzzy design for each corresponding bandwidth.

Numeric vector of the standard error for each corresponding bandwidth.

Numeric vector of the z statistic for each corresponding bandwidth.

Numeric vector of the p value for each corresponding bandwidth.

The matrix of the 95 for each corresponding bandwidth.

Numeric vector of the effective size (Cohen's d) for each estimate.

cov

The names of covariates.

Numeric vector of each bandwidth used in estimation.

obs

Vector of the number of observations within the corresponding bandwidth.

call

The matched call.

na.action

The observations removed from fitting due to missingness.

impute

Whether multiple imputation is used or not.

model

For a sharp design, a list of the lm objects is returned. For a fuzzy design, a list of lists is returned, each with two elements: firststage, the first stage lm object, and iv, the ivreg object. A model is returned for each corresponding bandwidth.

frame

Returns the model frame used in fitting.

References

Lee, D. S., Lemieux, T. (2010). Regression Discontinuity Designs in Economics. Journal of Economic Literature, 48(2), 281-355. http://www.aeaweb.org/articles.php?doi=10.1257/jel.48.2.281.

Imbens, G., Lemieux, T. (2008). Regression discontinuity designs: A guide to practice. Journal of Econometrics, 142(2), 615-635. http://dx.doi.org/10.1016/j.jeconom.2007.05.001.

Lee, D. S., Card, D. (2010). Regression discontinuity inference with specification error. Journal of Econometrics, 142(2), 655-674. http://dx.doi.org/10.1016/j.jeconom.2007.05.003.

Angrist, J. D., Pischke, J.-S. (2009). Mostly harmless econometrics: An empiricist's companion. Princeton, NJ: Princeton University Press.

Examples

Run this code

# NOT RUN {
x <- runif(1000, -1, 1)
cov <- rnorm(1000)
y <- 3 + 2 * x + 3 * cov + 10 * (x >= 0) + rnorm(1000)
rd_est(y ~ x, t.design = "geq")
# Efficiency gains can be made by including covariates
rd_est(y ~ x | cov, t.design = "geq")
# }

Run the code above in your browser using DataLab