ggpmisc (version 0.3.2)

stat_apply_group: Apply a function to x or y values

Description

stat_apply_group and stat_apply_panel apply functions to data. In most cases one should simply use transformations through scales or summary functions through stat_summary(). There are some computations that are not scale transformations but are not usual summaries either, the number of data values does not decrease. It is always possible to precompute quantities like cummulative sums or running medians, and for normalizations it can be convenient to apply such functions on-the-fly to ensure that grouping is consistent between computations and aesthetics. One particularity of these statistics is that they can apply simultaneously different functions to x values and to y values when needed. In contrast geom_smooth applies a function that takes both x and y values as arguments.

Usage

stat_apply_group(mapping = NULL, data = NULL, geom = "line",
  .fun.x = NULL, .fun.x.args = list(), .fun.y = NULL,
  .fun.y.args = list(), position = "identity", na.rm = FALSE,
  show.legend = FALSE, inherit.aes = TRUE, ...)

stat_apply_panel(mapping = NULL, data = NULL, geom = "line", .fun.x = NULL, .fun.x.args = list(), .fun.y = NULL, .fun.y.args = list(), position = "identity", na.rm = FALSE, show.legend = FALSE, inherit.aes = TRUE, ...)

Arguments

mapping

The aesthetic mapping, usually constructed with aes. Only needs to be set at the layer level if you are overriding the plot defaults.

data

A layer specific dataset - only needed if you want to override the plot defaults.

geom

The geometric object to use display the data

.fun.x, .fun.y

function to be applied or the name of the function to be applied as a character string. One and only one of these parameters should be passed a non-null argument.

.fun.x.args, .fun.y.args

additional arguments to be passed to the function as a named list.

position

The position adjustment to use for overlapping points on this layer

na.rm

a logical value indicating whether NA values should be stripped before the computation proceeds.

show.legend

logical. Should this layer be included in the legends? NA, the default, includes if any aesthetics are mapped. FALSE never includes, and TRUE always includes.

inherit.aes

If FALSE, overrides the default aesthetics, rather than combining with them. This is most useful for helper functions that define both data and aesthetics and shouldn't inherit behaviour from the default plot specification, e.g. borders.

...

other arguments passed on to layer. This can include aesthetics whose values you want to set, not map. See layer for more details.

Computed variables

One of x or y or both x and y replaced by the vector returned by the corresponding applied function.

x

x-value as returned by .fun.x

y

y-value as returned by .fun.y

Details

The function(s) to be applied is expected to be vectorized and to return a vector of (almost) the same length. The vector mapped to the x or y aesthetic is passed as the first positional argument to the call. The function must accept as first argument a vector or list that matches the data.

References

Answers question "R ggplot on-the-fly calculation by grouping variable" at https://stackoverflow.com/questions/51412522.

Examples

Run this code
# NOT RUN {
library(gginnards)
set.seed(123456)
my.df <- data.frame(X = rep(1:20,2),
                    Y = runif(40),
                    category = rep(c("A","B"), each = 20))

# make sure row are ordered for X as we will use functions that rely on this
my.df <- my.df[order(my.df[["X"]]), ]

ggplot(my.df, aes(x = X, y = Y, colour = category)) +
  stat_apply_group(.fun.y = cumsum)

# Use of geom_debug() to inspect the computed values
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
  stat_apply_group(.fun.y = cumsum, geom = "debug")

ggplot(my.df, aes(x = X, y = Y, colour = category)) +
  stat_apply_group(.fun.y = cummax)

ggplot(my.df, aes(x = X, y = Y, colour = category)) +
  stat_apply_group(.fun.x = cumsum, .fun.y = cumsum)

# diff returns a shorter vector by 1 for each group
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
  stat_apply_group(.fun.y = diff, na.rm = TRUE)

ggplot(my.df, aes(x = X, y = Y, colour = category)) +
  geom_point() +
  stat_apply_group(.fun.y = runmed, .fun.y.args = list(k = 5))

# Rescaling per group
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
  stat_apply_group(.fun.y = function(x) {(x - min(x)) / (max(x) - min(x))})

# Joint rescaling for whole panel
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
  stat_apply_panel(.fun.y = function(x) {(x - min(x)) / (max(x) - min(x))})

# }

Run the code above in your browser using DataLab