rdplotdensity
constructs density plots. It is based on the
local polynomial density estimator proposed in Cattaneo, Jansson and Ma (2020, 2023).
A companion Stata
package is described in Cattaneo, Jansson and Ma (2018).
Companion command: rddensity
for manipulation (density discontinuity) testing.
Related Stata and R packages useful for inference in regression discontinuity (RD) designs are described in the website: https://rdpackages.github.io/.
rdplotdensity(
rdd,
X,
plotRange = NULL,
plotN = 10,
plotGrid = c("es", "qs"),
alpha = 0.05,
type = NULL,
lty = NULL,
lwd = NULL,
lcol = NULL,
pty = NULL,
pwd = NULL,
pcol = NULL,
CItype = NULL,
CIuniform = FALSE,
CIsimul = 2000,
CIshade = NULL,
CIcol = NULL,
bwselect = NULL,
hist = TRUE,
histBreaks = NULL,
histFillCol = 3,
histFillShade = 0.2,
histLineCol = "white",
title = "",
xlabel = "",
ylabel = "",
legendTitle = NULL,
legendGroups = NULL,
noPlot = FALSE
)
Matrices containing estimation results:
(1) grid
(grid points),
(2) bw
(bandwidths),
(3) nh
(number of observations in each local neighborhood),
(4) nhu
(number of unique observations in each local neighborhood),
(5) f_p
(point estimates with p-th order local polynomial),
(6) f_q
(point estimates with q-th order local polynomial, only if option q
is nonzero),
(7) se_p
(standard error corresponding to f_p
), and (8) se_q
(standard error
corresponding to f_q
).
Variance-covariance matrix corresponding to f_p
.
Variance-covariance matrix corresponding to f_q
.
A list containing options passed to the function.
A stadnard ggplot
object is returned, hence can be used for further customization.
Object returned by rddensity
Numeric vector or one dimensional matrix/data frame, the running variable.
Numeric, specifies the lower and upper bound of the plotting region. Default is
[c-3*hl,c+3*hr]
(three bandwidths around the cutoff).
Numeric, specifies the number of grid points used for plotting on the two sides of the cutoff.
Default is c(10,10)
(i.e., 10 points are used on each side).
String, specifies how the grid points are positioned. Options are es
(evenly spaced)
and qs
(quantile spaced).
Numeric scalar between 0 and 1, the significance level for plotting confidence regions. If more than one is provided, they will be applied to the two sides accordingly.
String, one of "line"
(default), "points"
or "both"
, how
the point estimates are plotted. If more than one is provided, they will be applied to the two sides
accordingly.
Line type for point estimates, only effective if type
is "line"
or
"both"
. 1
for solid line, 2
for dashed line, 3
for dotted line.
For other options, see the instructions for ggplot2
or par
. If
more than one is provided, they will be applied to the two sides accordingly.
Line width for point estimates, only effective if type
is "line"
or
"both"
. Should be strictly positive. For other options, see the instructions for
ggplot2
or par
. If more than one is provided, they will be applied
to the two sides accordingly.
Line color for point estimates, only effective if type
is "line"
or
"both"
. 1
for black, 2
for red, 3
for green, 4
for blue.
For other options, see the instructions for ggplot2
or par
. If
more than one is provided, they will be applied to the two sides
accordingly.
Scatter plot type for point estimates, only effective if type
is "points"
or
"both"
. For options, see the instructions for ggplot2
or par
. If
more than one is provided, they will be applied to the two sides
accordingly.
Scatter plot size for point estimates, only effective if type
is "points"
or
"both"
. Should be strictly positive. If more than one is provided, they will be applied to the two sides
accordingly.
Scatter plot color for point estimates, only effective if type
is "points"
or
"both"
. 1
for black, 2
for red, 3
for green, 4
for blue.
For other options, see the instructions for ggplot2
or par
. If
more than one is provided, they will be applied to the two sides
accordingly.
String, one of "region"
(shaded region, default), "line"
(dashed lines),
"ebar"
(error bars), "all"
(all of the previous) or "none"
(no confidence region),
how the confidence region should be plotted. If more than one is provided, they will be applied to the two sides
accordingly.
TRUE
or FALSE
(default), plotting either pointwise confidence intervals (FALSE
) or
uniform confidence bands (TRUE
).
Positive integer, the number of simulations used to construct critical values (default is 2000). This
option is ignored if CIuniform=FALSE
.
Numeric, opaqueness of the confidence region, should be between 0 (transparent) and 1. Default is 0.2. If more than one is provided, they will be applied to the two sides accordingly.
Color of the confidence region. 1
for black, 2
for red, 3
for green, 4
for blue.
For other options, see the instructions for ggplot2
or par
. If
more than one is provided, they will be applied to the two sides
accordingly.
String, the method for data-driven bandwidth selection. Available options
are (1) "mse-dpi"
(mean squared error-optimal bandwidth selected for each grid point);
(2) "imse-dpi"
(integrated MSE-optimal bandwidth, common for all grid points);
(3) "mse-rot"
(rule-of-thumb bandwidth with Gaussian reference model);
and (4) "imse-rot"
(integrated rule-of-thumb bandwidth with Gaussian reference model).
If omitted, bandwidths returned by rddensity
will be used.
TRUE
(default) or FALSE
, whether adding a histogram to the background.
Numeric vector, giving the breakpoints between histogram cells.
Color of the histogram cells.
Opaqueness of the histogram cells, should be between 0 (transparent) and 1. Default is 0.2.
Color of the histogram lines.
Strings, title of the plot and labels for x- and y-axis.
String, title of legend.
String Vector, group names used in legend.
No density plot will be generated if set to TRUE
.
Matias D. Cattaneo, Princeton University cattaneo@princeton.edu.
Michael Jansson, University of California Berkeley. mjansson@econ.berkeley.edu.
Xinwei Ma (maintainer), University of California San Diego. x1ma@ucsd.edu.
Bias correction is only used for the construction of confidence intervals/bands, but not for point
estimation. The point estimates, denoted by f_p
, are constructed using local polynomial estimates of order
p
, while the centering of the confidence intervals/bands, denoted by f_q
, are constructed using local
polynomial estimates of order q
. The confidence intervals/bands take the form:
[f_q - cv * SE(f_q) , f_q + cv * SE(f_q)]
, where cv
denotes the appropriate critical value and
SE(f_q)
denotes a standard error estimate
for the centering of the confidence interval/band. As a result, the confidence intervals/bands may not be
centered at the point estimates because they have been bias-corrected. Setting q
and p
to be equal
results on centered at the point estimate confidence intervals/bands, but requires undersmoothing for valid
inference (i.e., (I)MSE-optimal bandwdith for the density point estimator cannot be used). Hence the bandwidth
would need to be specified manually when q=p
, and the point estimates will not be (I)MSE optimal. See
Cattaneo, Jansson and Ma (2022, 2023) for details, and also Calonico, Cattaneo, and Farrell (2018, 2022) for
robust bias correction methods.
Sometimes the density point estimates may lie outside of the confidence intervals/bands, which can happen if
the underlying distribution exhibits high curvature at some evaluation point(s). One possible solution in this
case is to increase the polynomial order p
or to employ a smaller bandwidth.
Calonico, S., M. D. Cattaneo, and M. H. Farrell. 2018. On the Effect of Bias Estimation on Coverage Accuracy in Nonparametric Inference. Journal of the American Statistical Association 113(522): 767-779. tools:::Rd_expr_doi("10.1080/01621459.2017.1285776")
Calonico, S., M. D. Cattaneo, and M. H. Farrell. 2022. Coverage Error Optimal Confidence Intervals for Local Polynomial Regression. Bernoulli, 28(4): 2998-3022. tools:::Rd_expr_doi("10.3150/21-BEJ1445")
Cattaneo, M. D., M. Jansson, and X. Ma. 2018. Manipulation Testing based on Density Discontinuity. Stata Journal 18(1): 234-261. tools:::Rd_expr_doi("10.1177/1536867X1801800115")
Cattaneo, M. D., M. Jansson, and X. Ma. 2020. Simple Local Polynomial Density Estimators. Journal of the American Statistical Association, 115(531): 1449-1455. tools:::Rd_expr_doi("10.1080/01621459.2019.1635480")
Cattaneo, M. D., M. Jansson, and X. Ma. 2022. lpdensity: Local Polynomial Density Estimation and Inference. Journal of Statistical Software, 101(2): 1–25. tools:::Rd_expr_doi("10.18637/jss.v101.i02")
Cattaneo, M. D., M. Jansson, and X. Ma. 2023. Local Regression Distribution Estimators. Journal of Econometrics, 240(2): 105074. tools:::Rd_expr_doi("10.1016/j.jeconom.2021.01.006")
rddensity
# Generate a random sample with a density discontinuity at 0
set.seed(42)
x <- rnorm(2000, mean = -0.5)
x[x > 0] <- x[x > 0] * 2
# Estimation
rdd <- rddensity(X = x)
summary(rdd)
# Density plot (from -2 to 2 with 25 evaluation points at each side)
plot1 <- rdplotdensity(rdd, x, plotRange = c(-2, 2), plotN = 25)
# Plotting a uniform confidence band
set.seed(42) # fix the seed for simulating critical values
plot3 <- rdplotdensity(rdd, x, plotRange = c(-2, 2), plotN = 25, CIuniform = TRUE)
Run the code above in your browser using DataLab