This function dresses up the plot.ecdf function and provides some additional functionality to directly compare distributions at specific locations along the scale. Specifically, multiple empirical CDFs can be plotted with a single call, and the differences between any pair, or all, CDFs can optionally be plotted in terms of both raw percentage differences and/or in terms of standard deviation units through inverse normal transformations. See Ho & Reardon, 2012. (Note, not all features implemented yet)
ecdf_plot(formula, data, ref_cut = NULL, center = FALSE, max_line = FALSE,
ref_hor = FALSE, ref_rect = TRUE, scheme = "ggplot2", legend = "side",
annotate = FALSE, theme = "standard", ...)
A formula of the type out ~ group
where out
is
the outcome variable and group
is the grouping variable. Note this
variable can include any arbitrary number of groups.
The data frame that the data in the formula come from.
Optional numeric vector stating the location of reference line(s) and/or rectangle(s).
Logical. Should the functions be centered prior to plotting? Defaults to FALSE
.
Logical. Should the maximum distance between any two curves
be plotted? This distance is equivalent to the value tested by the
Kolmogorov-Smirnov test. Defaults to FALSE
.
Logical, defaults to FALSE
. Should horizontal
reference lines be plotted at the location of ref_cut
?
Logical, defaults to TRUE
. Should semi-transparent
rectangle(s) be plotted at the locations of ref_cut
?
What color scheme should the lines follow? Defaults to
mimic the ggplot2 color scheme. Other options come from the
viridisLite
package, and must be installed first. These are the same options available
in the package: "viridis", "magma", "inferno", and "plasma". These color
schemes work well for color blindness and print well in black and white.
Alternatively, colors can be supplied manually through a call to col
(through ...
).
The type of legend to be displayed, with possible values
"base"
, "side"
, or "none"
. Defaults to "side"
,
when there are more than two groups and "none"
when only comparing
two groups. If the option "side"
is used the plot is split into two
plots, via layout, with the legend displayed in the second
plot. This scales better than the base legend (i.e., manually manipulating
the size of the plot after it is rendered), but is not compatible with
multi-panel plotting (e.g., par(mfrow = c(2, 2))
for a 2 by 2 plot).
When producing multi-panel plots, use "none"
or "base"
, the
latter of which produces the legend with the base legend
function.
Logical. Defaults to FALSE
. When TRUE
and
legend == "side"
the plot is rendered such that additional
annotations can be made on the plot using low level base plotting functions
(e.g., arrows). However, if set to TRUE
,
dev.off must be called before a new plot is rendered
(i.e., close the current plotting window). Otherwise the plot will be
attempted to be rendered in the region designated for the legend. Argument
is ignored when legend != "side"
.
Visual properties of the plot. There are currently only two
themes implemented - a standard plot and a dark theme. If NULL
(default), the theme will be produced with a standard white background. If
"dark"
, a dark gray background will be used with white text and axes.
Additional arguments passed to plot. Note that
it is best to use the full argument rather than partial matching, given the
method used to call the plot. While some partial matching is supported
(e.g., m
for main
, it is generally safest to supply the full
argument).
# NOT RUN {
# Produce base empirical cummulative distribution plot
ecdf_plot(mean ~ grade, seda)
# Shade distributions to the right of three cut scores
ecdf_plot(mean ~ grade,
seda,
ref_cut = c(225, 245, 265))
# Add horizontal reference lines
ecdf_plot(mean ~ grade,
seda,
ref_cut = c(225, 245, 265),
ref_hor = TRUE)
# Apply dark theme
ecdf_plot(mean ~ grade,
seda,
ref_cut = c(225, 245, 265),
theme = "dark")
# }
Run the code above in your browser using DataLab