Conditional Density Plots
Computes and plots conditional densities describing how the
conditional distribution of a categorical variable
y changes over a
# S3 method for default cdplot(x, y, plot = TRUE, tol.ylab = 0.05, ylevels = NULL, bw = "nrd0", n = 512, from = NULL, to = NULL, col = NULL, border = 1, main = "", xlab = NULL, ylab = NULL, yaxlabels = NULL, xlim = NULL, ylim = c(0, 1), …)
# S3 method for formula cdplot(formula, data = list(), plot = TRUE, tol.ylab = 0.05, ylevels = NULL, bw = "nrd0", n = 512, from = NULL, to = NULL, col = NULL, border = 1, main = "", xlab = NULL, ylab = NULL, yaxlabels = NULL, xlim = NULL, ylim = c(0, 1), …, subset = NULL)
- an object, the default method expects a single numerical variable (or an object coercible to this).
"factor"interpreted to be the dependent variable
y ~ xwith a single dependent
"factor"and a single numerical explanatory variable.
- an optional data frame.
- logical. Should the computed conditional densities be plotted?
- convenience tolerance parameter for y-axis annotation. If the distance between two labels drops under this threshold, they are plotted equidistantly.
- a character or numeric vector specifying in which order the levels of the dependent variable should be plotted.
- bw, n, from, to, …
- arguments passed to
- a vector of fill colors of the same length as
levels(y). The default is to call
- border color of shaded polygons.
- main, xlab, ylab
- character strings for annotation
- character vector for annotation of y axis, defaults to
- xlim, ylim
- the range of x and y values with sensible defaults.
- an optional vector specifying a subset of observations to be used for plotting.
cdplot computes the conditional densities of
the levels of
y weighted by the marginal distribution of
The densities are derived cumulatively over the levels of
y. This visualization technique is similar to spinograms (see
and plots \(P(y | x)\) against \(x\). The conditional probabilities
are not derived by discretization (as in the spinogram), but using a smoothing
density. Note, that the estimates of the conditional densities are more reliable for
high-density regions of \(x\). Conversely, the are less reliable in regions
with only few \(x\) observations.
The conditional density functions (cumulative over the levels of
are returned invisibly.
Hofmann, H., Theus, M. (2005), Interactive graphics for visualizing conditional distributions, Unpublished Manuscript.
## NASA space shuttle o-ring failures fail <- factor(c(2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1), levels = 1:2, labels = c("no", "yes")) temperature <- c(53, 57, 58, 63, 66, 67, 67, 67, 68, 69, 70, 70, 70, 70, 72, 73, 75, 75, 76, 76, 78, 79, 81) ## CD plot cdplot(fail ~ temperature) cdplot(fail ~ temperature, bw = 2) cdplot(fail ~ temperature, bw = "SJ") ## compare with spinogram (spineplot(fail ~ temperature, breaks = 3)) ## highlighting for failures cdplot(fail ~ temperature, ylevels = 2:1) ## scatter plot with conditional density cdens <- cdplot(fail ~ temperature, plot = FALSE) plot(I(as.numeric(fail) - 1) ~ jitter(temperature, factor = 2), xlab = "Temperature", ylab = "Conditional failure probability") lines(53:81, 1 - cdens[](53:81), col = 2)