gghighlight
Highlight lines and points in ggplot2.
Installation
install.packages("dplyr")
# Or the development version from GitHub:
# install.packages("devtools")
devtools::install_github("tidyverse/dplyr")Example
Suppose the data has a lot of series.
library(dplyr, warn.conflicts = FALSE)
set.seed(1)
d <- tibble(
idx = 1:10000,
value = runif(idx, -1, 1),
type = sample(letters, size = length(idx), replace = TRUE)
) %>%
group_by(type) %>%
mutate(value = cumsum(value)) %>%
ungroup()It is difficult to distinguish them by colour.
library(ggplot2)
ggplot(d) +
geom_line(aes(idx, value, colour = type))So we are motivated to highlight only important series, like this:
library(gghighlight)
gghighlight_line(d, aes(idx, value, colour = type), max(value) > 20)As gghighlight_*() returns a ggplot object, it is customizable just as we usually do with ggplot2. (Note that, while gghighlights doesn't require ggplot2 loaded, ggplot2 need to be loaded to customize the plot)
gghighlight_line(d, aes(idx, value, colour = type), max(value) > 20) +
theme_minimal()The plot also can be facetted:
gghighlight_line(d, aes(idx, value, colour = type), max(value) > 20) +
facet_wrap(~ type)Supported geoms
Line
library(gghighlight)
gghighlight_line(d, aes(idx, value, colour = type), max(value) > 20)Point
set.seed(10)
d2 <- sample_n(d, 20)
gghighlight_point(d2, aes(idx, value), value > 0)
#> Warning in gghighlight_point(d2, aes(idx, value), value > 0): Using type as
#> label for now, but please provide the label_key explicity!Grouped vs ungrouped
You may notice that the gghighlight_line() and gghighlight_point() has different semantics.
By default, gghighlight_line() calculates predicate per group, more precisely, dplyr::group_by() + dplyr::summarise(). So if the predicate expression returns more than one value per group, it ends up with an error like this:
gghighlight_line(d, aes(idx, value, colour = type), value > 20)
#> Error in summarise_impl(.data, dots): Column `predicate..........` must be length 1 (a summary value), not 387On the other hand, gghighlight_point() calculates predicate per row by default. This behaviour can be controled via use_group_by argument like this:
gghighlight_point(d2, aes(idx, value, colour = type), max(value) > 0, use_group_by = TRUE)
#> Warning in gghighlight_point(d2, aes(idx, value, colour = type), max(value)
#> > : Using type as label for now, but please provide the label_key
#> explicity!While gghighlight_line() also has use_group_by argument, I don't think ungrouped lines can be interesting because data that can be represented as line must have its series, or groups.
Non-logical predicate
To construct a predicate expression like bellow, we need to determine a threshold (in this example, 20). But it is difficult to choose a nice one before we draw plots.
max(value) > 20So, gghighlight_*() allows predicates that return numeric (or character) results. The values are used for sorting data and the top max_highlight of rows/groups are highlighted:
gghighlight_line(d, aes(idx, value, colour = type), max(value), max_highlight = 5L)