An extension of standard boxplots which draws k letter statistics. Conventional boxplots (Tukey 1977) are useful displays for conveying rough information about the central 50% of the data and the extent of the data. For moderate-sized data sets (\(n < 1000\)), detailed estimates of tail behavior beyond the quartiles may not be trustworthy, so the information provided by boxplots is appropriately somewhat vague beyond the quartiles, and the expected number of ``outliers'' and ``far-out'' values for a Gaussian sample of size \(n\) is often less than 10 (Hoaglin, Iglewicz, and Tukey 1986). Large data sets (\(n \approx 10,000-100,000\)) afford more precise estimates of quantiles in the tails beyond the quartiles and also can be expected to present a large number of ``outliers'' (about \(0.4 + 0.007 n\)). The letter-value box plot addresses both these shortcomings: it conveys more detailed information in the tails using letter values, only out to the depths where the letter values are reliable estimates of their corresponding quantiles (corresponding to tail areas of roughly \(2^{-i}\)); ``outliers'' are defined as a function of the most extreme letter value shown. All aspects shown on the letter-value boxplot are actual observations, thus remaining faithful to the principles that governed Tukey's original boxplot.
geom_lv(
mapping = NULL,
data = NULL,
stat = "lv",
position = "dodge",
outlier.colour = "black",
outlier.shape = 19,
outlier.size = 1.5,
outlier.stroke = 0.5,
na.rm = TRUE,
varwidth = FALSE,
width.method = "linear",
show.legend = NA,
inherit.aes = TRUE,
...
)GeomLv
scale_fill_lv(...)
stat_lv(
mapping = NULL,
data = NULL,
geom = "lv",
position = "dodge",
na.rm = TRUE,
conf = 0.95,
percent = NULL,
k = NULL,
show.legend = NA,
inherit.aes = TRUE,
...
)
StatLv
An object of class GeomLv
(inherits from Geom
, ggproto
, gg
) of length 6.
An object of class StatLv
(inherits from Stat
, ggproto
, gg
) of length 5.
Set of aesthetic mappings created by aes()
. If specified and
inherit.aes = TRUE
(the default), it is combined with the default mapping
at the top level of the plot. You must supply mapping
if there is no plot
mapping.
The data to be displayed in this layer. There are three options:
If NULL
, the default, the data is inherited from the plot
data as specified in the call to ggplot()
.
A data.frame
, or other object, will override the plot
data. All objects will be fortified to produce a data frame. See
fortify()
for which variables will be created.
A function
will be called with a single argument,
the plot data. The return value must be a data.frame
, and
will be used as the layer data. A function
can be created
from a formula
(e.g. ~ head(.x, 10)
).
A position adjustment to use on the data for this layer. This
can be used in various ways, including to prevent overplotting and
improving the display. The position
argument accepts the following:
The result of calling a position function, such as position_jitter()
.
This method allows for passing extra arguments to the position.
A string naming the position adjustment. To give the position as a
string, strip the function name of the position_
prefix. For example,
to use position_jitter()
, give the position as "jitter"
.
For more information and other ways to specify the position, see the layer position documentation.
Override aesthetics used for the outliers. Defaults
come from geom_point()
.
Override aesthetics used for the outliers. Defaults
come from geom_point()
.
Override aesthetics used for the outliers. Defaults
come from geom_point()
.
Override aesthetics used for the outliers. Defaults
come from geom_point()
.
If FALSE
(the default), removes missing values with
a warning. If TRUE
silently removes missing values.
if FALSE
(default) draw boxes that are the same size for each group. If
TRUE
, boxes are drawn with widths proportional to the
square-roots of the number of observations in the groups (possibly
weighted, using the weight
aesthetic).
character, one of 'linear' (default), 'area', or 'height'. This parameter
determines whether the width of the box for letter value LV(i)
should be proportional to i (linear),
proportional to \(2^{-i}\) (height), or whether the area of the box should be proportional to \(2^{-i}\) (area).
logical. Should this layer be included in the legends?
NA
, the default, includes if any aesthetics are mapped.
FALSE
never includes, and TRUE
always includes.
It can also be a named logical vector to finely select the aesthetics to
display. To include legend keys for all levels, even
when no data exists, use TRUE
. If NA
, all levels are shown in legend,
but unobserved levels are omitted.
If FALSE
, overrides the default aesthetics,
rather than combining with them. This is most useful for helper functions
that define both data and aesthetics and shouldn't inherit behaviour from
the default plot specification.
Other arguments passed on to layer()
's params
argument. These
arguments broadly fall into one of 4 categories below. Notably, further
arguments to the position
argument, or aesthetics that are required
can not be passed through ...
. Unknown arguments that are not part
of the 4 categories below are ignored.
Static aesthetics that are not mapped to a scale, but are at a fixed
value and apply to the layer as a whole. For example, colour = "red"
or linewidth = 3
. The geom's documentation has an Aesthetics
section that lists the available options. The 'required' aesthetics
cannot be passed on to the params
. Please note that while passing
unmapped aesthetics as vectors is technically possible, the order and
required length is not guaranteed to be parallel to the input data.
When constructing a layer using
a stat_*()
function, the ...
argument can be used to pass on
parameters to the geom
part of the layer. An example of this is
stat_density(geom = "area", outline.type = "both")
. The geom's
documentation lists which parameters it can accept.
Inversely, when constructing a layer using a
geom_*()
function, the ...
argument can be used to pass on parameters
to the stat
part of the layer. An example of this is
geom_area(stat = "density", adjust = 0.5)
. The stat's documentation
lists which parameters it can accept.
The key_glyph
argument of layer()
may also be passed on through
...
. This can be one of the functions described as
key glyphs, to change the display of the layer in the legend.
Use to override the default connection between
geom_lv
and stat_lv
.
confidence level
numeric value: percent of data in outliers
number of letter values shown
Number of Letter Values used for the display
Name of the Letter Value
width of the interquartile box
McGill, R., Tukey, J. W. and Larsen, W. A. (1978) Variations of box plots. The American Statistician 32, 12-16.
stat_quantile
to view quantiles conditioned on a
continuous variable.
library(ggplot2)
p <- ggplot(mpg, aes(class, hwy))
p + geom_lv(aes(fill = after_stat(LV))) + scale_fill_brewer()
p + geom_lv() + geom_jitter(width = 0.2)
p + geom_lv(aes(fill = after_stat(LV))) + scale_fill_lv()
# Outliers
p + geom_lv(varwidth = TRUE, aes(fill = after_stat(LV))) + scale_fill_lv()
p + geom_lv(fill = "grey80", colour = "black")
p + geom_lv(outlier.colour = "red", outlier.shape = 1)
# Plots are automatically dodged when any aesthetic is a factor
p + geom_lv(aes(fill = drv))
# varwidth adjusts the width of the boxes according to the number of observations
ggplot(ontime, aes(UniqueCarrier, TaxiIn + TaxiOut)) +
geom_lv(aes(fill = after_stat(LV)), varwidth=TRUE) +
scale_fill_lv() +
scale_y_sqrt() +
theme_bw()
ontime$DayOfWeek <- as.POSIXlt(ontime$FlightDate)$wday
ggplot(ontime, aes(factor(DayOfWeek), TaxiIn + TaxiOut)) +
geom_lv(aes(fill = after_stat(LV))) +
scale_fill_lv() +
scale_y_sqrt() +
theme_bw()
Run the code above in your browser using DataLab