ecdf_plot: Empirical Cumulative Distribution Plot

Description

This function dresses up the plot.ecdf function and provides some additional functionality to directly compare distributions at specific locations along the scale. Specifically, multiple empirical CDFs can be plotted with a single call, and the differences between any pair, or all, CDFs can optionally be plotted in terms of both raw percentage differences and/or in terms of standard deviation units through inverse normal transformations. See Ho & Reardon, 2012. (Note, not all features implemented yet)

Usage

ecdf_plot(formula, data, ref_cut = NULL, center = FALSE, max_line = FALSE,
  ref_hor = FALSE, ref_rect = TRUE, scheme = "ggplot2", legend = "side",
  annotate = FALSE, theme = "standard", ...)

Arguments

formula

A formula of the type out ~ group where out is the outcome variable and group is the grouping variable. Note this variable can include any arbitrary number of groups.

data

The data frame that the data in the formula come from.

ref_cut

Optional numeric vector stating the location of reference line(s) and/or rectangle(s).

center

Logical. Should the functions be centered prior to plotting? Defaults to FALSE.

max_line

Logical. Should the maximum distance between any two curves be plotted? This distance is equivalent to the value tested by the Kolmogorov-Smirnov test. Defaults to FALSE.

ref_hor

Logical, defaults to FALSE. Should horizontal reference lines be plotted at the location of ref_cut?

ref_rect

Logical, defaults to TRUE. Should semi-transparent rectangle(s) be plotted at the locations of ref_cut?

scheme

What color scheme should the lines follow? Defaults to mimic the ggplot2 color scheme. Other options come from the viridisLite package, and must be installed first. These are the same options available in the package: "viridis", "magma", "inferno", and "plasma". These color schemes work well for color blindness and print well in black and white. Alternatively, colors can be supplied manually through a call to col (through ...).

legend

The type of legend to be displayed, with possible values "base", "side", or "none". Defaults to "side", when there are more than two groups and "none" when only comparing two groups. If the option "side" is used the plot is split into two plots, via layout, with the legend displayed in the second plot. This scales better than the base legend (i.e., manually manipulating the size of the plot after it is rendered), but is not compatible with multi-panel plotting (e.g., par(mfrow = c(2, 2)) for a 2 by 2 plot). When producing multi-panel plots, use "none" or "base", the latter of which produces the legend with the base legend function.

annotate

Logical. Defaults to FALSE. When TRUE and legend == "side" the plot is rendered such that additional annotations can be made on the plot using low level base plotting functions (e.g., arrows). However, if set to TRUE, dev.off must be called before a new plot is rendered (i.e., close the current plotting window). Otherwise the plot will be attempted to be rendered in the region designated for the legend. Argument is ignored when legend != "side".

theme

Visual properties of the plot. There are currently only two themes implemented - a standard plot and a dark theme. If NULL (default), the theme will be produced with a standard white background. If "dark", a dark gray background will be used with white text and axes.

...

Additional arguments passed to plot. Note that it is best to use the full argument rather than partial matching, given the method used to call the plot. While some partial matching is supported (e.g., m for main, it is generally safest to supply the full argument).

Examples

Run this code

# NOT RUN {
# Produce base empirical cummulative distribution plot
ecdf_plot(mean ~ grade, seda)

# Shade distributions to the right of three cut scores
ecdf_plot(mean ~ grade, 
		seda,
		ref_cut = c(225, 245, 265))

# Add horizontal reference lines
ecdf_plot(mean ~ grade, 
		seda,
		ref_cut = c(225, 245, 265),
		ref_hor = TRUE)

# Apply dark theme
ecdf_plot(mean ~ grade, 
		seda,
		ref_cut = c(225, 245, 265),
		theme = "dark")
# }

Run the code above in your browser using DataLab