rdplotdensity: Density Plotting for Manipulation Testing

Description

rdplotdensity constructs density plots. It is based on the local polynomial density estimator proposed in Cattaneo, Jansson and Ma (2020, 2023). A companion Stata package is described in Cattaneo, Jansson and Ma (2018).

Companion command: rddensity for manipulation (density discontinuity) testing.

Related Stata and R packages useful for inference in regression discontinuity (RD) designs are described in the website: https://rdpackages.github.io/.

Usage

rdplotdensity(
  rdd,
  X,
  plotRange = NULL,
  plotN = 10,
  plotGrid = c("es", "qs"),
  alpha = 0.05,
  type = NULL,
  lty = NULL,
  lwd = NULL,
  lcol = NULL,
  pty = NULL,
  pwd = NULL,
  pcol = NULL,
  CItype = NULL,
  CIuniform = FALSE,
  CIsimul = 2000,
  CIshade = NULL,
  CIcol = NULL,
  bwselect = NULL,
  hist = TRUE,
  histBreaks = NULL,
  histFillCol = 3,
  histFillShade = 0.2,
  histLineCol = "white",
  title = "",
  xlabel = "",
  ylabel = "",
  legendTitle = NULL,
  legendGroups = NULL,
  noPlot = FALSE
)

Value

Estl, Estr: Matrices containing estimation results: (1) grid (grid points), (2) bw (bandwidths), (3) nh (number of observations in each local neighborhood), (4) nhu (number of unique observations in each local neighborhood), (5) f_p (point estimates with p-th order local polynomial), (6) f_q (point estimates with q-th order local polynomial, only if option q is nonzero), (7) se_p (standard error corresponding to f_p), and (8) se_q (standard error corresponding to f_q). Variance-covariance matrix corresponding to f_p. Variance-covariance matrix corresponding to f_q. A list containing options passed to the function.
Estplot: A stadnard ggplot object is returned, hence can be used for further customization.

Arguments

rdd: Object returned by rddensity
X: Numeric vector or one dimensional matrix/data frame, the running variable.
plotRange: Numeric, specifies the lower and upper bound of the plotting region. Default is [c-3*hl,c+3*hr] (three bandwidths around the cutoff).
plotN: Numeric, specifies the number of grid points used for plotting on the two sides of the cutoff. Default is c(10,10) (i.e., 10 points are used on each side).
plotGrid: String, specifies how the grid points are positioned. Options are es (evenly spaced) and qs (quantile spaced).
alpha: Numeric scalar between 0 and 1, the significance level for plotting confidence regions. If more than one is provided, they will be applied to the two sides accordingly.
type: String, one of "line" (default), "points" or "both", how the point estimates are plotted. If more than one is provided, they will be applied to the two sides accordingly.
lty: Line type for point estimates, only effective if type is "line" or "both". 1 for solid line, 2 for dashed line, 3 for dotted line. For other options, see the instructions for ggplot2 or par. If more than one is provided, they will be applied to the two sides accordingly.
lwd: Line width for point estimates, only effective if type is "line" or "both". Should be strictly positive. For other options, see the instructions for ggplot2 or par. If more than one is provided, they will be applied to the two sides accordingly.
lcol: Line color for point estimates, only effective if type is "line" or "both". 1 for black, 2 for red, 3 for green, 4 for blue. For other options, see the instructions for ggplot2 or par. If more than one is provided, they will be applied to the two sides accordingly.
pty: Scatter plot type for point estimates, only effective if type is "points" or "both". For options, see the instructions for ggplot2 or par. If more than one is provided, they will be applied to the two sides accordingly.
pwd: Scatter plot size for point estimates, only effective if type is "points" or "both". Should be strictly positive. If more than one is provided, they will be applied to the two sides accordingly.
pcol: Scatter plot color for point estimates, only effective if type is "points" or "both". 1 for black, 2 for red, 3 for green, 4 for blue. For other options, see the instructions for ggplot2 or par. If more than one is provided, they will be applied to the two sides accordingly.
CItype: String, one of "region" (shaded region, default), "line" (dashed lines), "ebar" (error bars), "all" (all of the previous) or "none" (no confidence region), how the confidence region should be plotted. If more than one is provided, they will be applied to the two sides accordingly.
CIuniform: TRUE or FALSE (default), plotting either pointwise confidence intervals (FALSE) or uniform confidence bands (TRUE).
CIsimul: Positive integer, the number of simulations used to construct critical values (default is 2000). This option is ignored if CIuniform=FALSE.
CIshade: Numeric, opaqueness of the confidence region, should be between 0 (transparent) and 1. Default is 0.2. If more than one is provided, they will be applied to the two sides accordingly.
CIcol: Color of the confidence region. 1 for black, 2 for red, 3 for green, 4 for blue. For other options, see the instructions for ggplot2 or par. If more than one is provided, they will be applied to the two sides accordingly.
bwselect: String, the method for data-driven bandwidth selection. Available options are (1) "mse-dpi" (mean squared error-optimal bandwidth selected for each grid point); (2) "imse-dpi" (integrated MSE-optimal bandwidth, common for all grid points); (3) "mse-rot" (rule-of-thumb bandwidth with Gaussian reference model); and (4) "imse-rot" (integrated rule-of-thumb bandwidth with Gaussian reference model). If omitted, bandwidths returned by rddensity will be used.
hist: TRUE (default) or FALSE, whether adding a histogram to the background.
histBreaks: Numeric vector, giving the breakpoints between histogram cells.
histFillCol: Color of the histogram cells.
histFillShade: Opaqueness of the histogram cells, should be between 0 (transparent) and 1. Default is 0.2.
histLineCol: Color of the histogram lines.
title, xlabel, ylabel: Strings, title of the plot and labels for x- and y-axis.
legendTitle: String, title of legend.
legendGroups: String Vector, group names used in legend.
noPlot: No density plot will be generated if set to TRUE.

Author

Matias D. Cattaneo, Princeton University cattaneo@princeton.edu.

Michael Jansson, University of California Berkeley. mjansson@econ.berkeley.edu.

Xinwei Ma (maintainer), University of California San Diego. x1ma@ucsd.edu.

Details

Bias correction is only used for the construction of confidence intervals/bands, but not for point estimation. The point estimates, denoted by f_p, are constructed using local polynomial estimates of order p, while the centering of the confidence intervals/bands, denoted by f_q, are constructed using local polynomial estimates of order q. The confidence intervals/bands take the form: [f_q - cv * SE(f_q) , f_q + cv * SE(f_q)], where cv denotes the appropriate critical value and SE(f_q) denotes a standard error estimate for the centering of the confidence interval/band. As a result, the confidence intervals/bands may not be centered at the point estimates because they have been bias-corrected. Setting q and p to be equal results on centered at the point estimate confidence intervals/bands, but requires undersmoothing for valid inference (i.e., (I)MSE-optimal bandwdith for the density point estimator cannot be used). Hence the bandwidth would need to be specified manually when q=p, and the point estimates will not be (I)MSE optimal. See Cattaneo, Jansson and Ma (2022, 2023) for details, and also Calonico, Cattaneo, and Farrell (2018, 2022) for robust bias correction methods.

Sometimes the density point estimates may lie outside of the confidence intervals/bands, which can happen if the underlying distribution exhibits high curvature at some evaluation point(s). One possible solution in this case is to increase the polynomial order p or to employ a smaller bandwidth.

References

Calonico, S., M. D. Cattaneo, and M. H. Farrell. 2018. On the Effect of Bias Estimation on Coverage Accuracy in Nonparametric Inference. Journal of the American Statistical Association 113(522): 767-779. tools:::Rd_expr_doi("10.1080/01621459.2017.1285776")

Calonico, S., M. D. Cattaneo, and M. H. Farrell. 2022. Coverage Error Optimal Confidence Intervals for Local Polynomial Regression. Bernoulli, 28(4): 2998-3022. tools:::Rd_expr_doi("10.3150/21-BEJ1445")

Cattaneo, M. D., M. Jansson, and X. Ma. 2018. Manipulation Testing based on Density Discontinuity. Stata Journal 18(1): 234-261. tools:::Rd_expr_doi("10.1177/1536867X1801800115")

Cattaneo, M. D., M. Jansson, and X. Ma. 2020. Simple Local Polynomial Density Estimators. Journal of the American Statistical Association, 115(531): 1449-1455. tools:::Rd_expr_doi("10.1080/01621459.2019.1635480")

Cattaneo, M. D., M. Jansson, and X. Ma. 2022. lpdensity: Local Polynomial Density Estimation and Inference. Journal of Statistical Software, 101(2): 1–25. tools:::Rd_expr_doi("10.18637/jss.v101.i02")

Cattaneo, M. D., M. Jansson, and X. Ma. 2023. Local Regression Distribution Estimators. Journal of Econometrics, 240(2): 105074. tools:::Rd_expr_doi("10.1016/j.jeconom.2021.01.006")

Examples

Run this code

# Generate a random sample with a density discontinuity at 0
set.seed(42)
x <- rnorm(2000, mean = -0.5)
x[x > 0] <- x[x > 0] * 2

# Estimation
rdd <- rddensity(X = x)
summary(rdd)

# Density plot (from -2 to 2 with 25 evaluation points at each side)
plot1 <- rdplotdensity(rdd, x, plotRange = c(-2, 2), plotN = 25)

# Plotting a uniform confidence band
set.seed(42) # fix the seed for simulating critical values
plot3 <- rdplotdensity(rdd, x, plotRange = c(-2, 2), plotN = 25, CIuniform = TRUE)

Run the code above in your browser using DataLab