Learn R Programming

ggalluvial (version 0.6.0)

alluvial-data: Check for alluvial structure and convert between alluvial formats

Description

Alluvial diagrams consist of multiple horizontally-distributed columns (axes) representing factor variables, vertical divisions (strata) of these axes representing these variables' values; and splines (alluvial flows) connecting vertical subdivisions (lodes) within strata of adjacent axes representing subsets or amounts of observations that take the corresponding values of the corresponding variables. This function checks a data frame for either of two types of alluvial structure:

  • One row per lode, wherein each row encodes a subset or amount of observations having a specific profile of axis values, a key field encodes the axis, a value field encodes the value within each axis, and a id column identifies multiple lodes corresponding to the same subset or amount of observations.

  • One row per alluvium, wherein each row encodes a subset or amount of observations having a specific profile of axis values and a set axes of fields encodes its values at each axis variable.

If no arguments are assigned to any of these parameters, then is_alluvial will default to is_alluvial_alluvia and assume that all fields in data (other than weight, if given) are to be treated as axes.

Usage

is_alluvial(data, ..., logical = TRUE, silent = FALSE)

is_alluvial_lodes(data, key, value, id, weight = NULL, logical = TRUE, silent = FALSE)

is_alluvial_alluvia(data, axes = NULL, weight = NULL, logical = TRUE, silent = FALSE)

to_lodes(data, key = "x", value = "stratum", id = "alluvium", axes, diffuse = FALSE, discern = FALSE)

to_alluvia(data, key, value, id, distill = FALSE)

Arguments

data

A data frame.

...

Additional parameters used to determine method and passed thereto. All or none of key, value, and id, or else optionally axes, and (in either case) optionally weight.

logical

Whether to return a logical value or a character string indicating the type of alluvial structure ("none", "lodes", or "alluvia")

silent

Whether to print warning messages.

key, value, id

Numeric or character; the fields of data corresponding to the axis (key), stratum (value), and alluvium (identifying) variables.

weight

Optional numeric or character; the fields of data corresponding to alluvium or lode weights (heights when plotted).

axes

Numeric or character vector; the field(s) of data corresponding to the axi(e)s (variable(s)).

diffuse

A numeric or character vector indicating which variables among those passed to axes to merge into the reshapen data by id. Alternatively, a logical value indicating whether to merge all (TRUE) or none (FALSE) of these variables.

discern

Logical value indicating whether to suffix values of the variables passed to axes that appear at more than one variable in order to distinguish their factor levels. This forces the levels of the combined factor variable value to be in the order of the axes.

distill

A logical value indicating whether to include variables, other than those passed to key and value, that vary within values of id. Alternatively, a function (or its name) to be used to distill each such variable to a single value. In addition to existing functions, distill accepts the character values "first" (used if distill is TRUE), "last", and "most" (which returns the modal value).

Details

to_lodes takes a data frame with several designated variables to be used as axes in an alluvial diagram, and reshapes the data frame so that the axis variable names constitute a new factor variable and their values comprise another. Other variables' values will be repeated, and a row-grouping variable can be introduced. This function invokes gather_.

to_alluvia takes a data frame with axis and axis value variables to be used in an alluvial diagram, and reshape the data frame so that the axes constitute separate variables whose values are given by the value variable. This function invokes spread_.

Examples

Run this code
# NOT RUN {
# Titanic data in alluvia format
titanic_alluvia <- as.data.frame(Titanic)
head(titanic_alluvia)
is_alluvial(titanic_alluvia,
            weight = "Freq")
# Titanic data in lodes format
titanic_lodes <- to_lodes(titanic_alluvia,
                          key = "x", value = "stratum", id = "alluvium",
                          axes = 1:4)
head(titanic_lodes)
is_alluvial(titanic_lodes,
            key = "x", value = "stratum", id = "alluvium",
            weight = "Freq")
# again in lodes format, this time diffusing the 'Class' variable
titanic_lodes2 <- to_lodes(titanic_alluvia,
                           key = "variable", value = "value", id = "passenger",
                           axes = 1:3, diffuse = "Class")
head(titanic_lodes2)
is_alluvial(titanic_lodes2,
            key = "variable", value = "value", id = "passenger",
            weight = "Freq")

# curriculum data in lodes format
data(majors)
head(majors)
is_alluvial(majors,
            key = "semester", value = "curriculum", id = "student",
            logical = FALSE)
# curriculum data in alluvia format
majors_alluvia <- to_alluvia(
  majors,
  key = "semester", value = "curriculum", id = "student"
)
head(majors_alluvia)
is_alluvial(majors_alluvia,
            axes = 2:9,
            logical = FALSE)

# distill variables that vary within 'id' values
set.seed(1)
majors$hypo_grade <- LETTERS[sample(5, size = nrow(majors), replace = TRUE)]
majors_alluvia2 <- to_alluvia(
  majors,
  key = "semester", value = "curriculum", id = "student",
  distill = "most"
)
head(majors_alluvia2)

# options to distinguish strata at different axes
gg <- ggplot(majors_alluvia,
             aes(axis1 = CURR1, axis2 = CURR7, axis3 = CURR13))
gg +
  geom_alluvium(aes(fill = as.factor(student)), discern = TRUE) +
  geom_stratum(discern = TRUE) +
  geom_text(stat = "stratum", discern = TRUE, label.strata = TRUE)
gg +
  geom_alluvium(aes(fill = as.factor(student)), discern = FALSE) +
  geom_stratum(discern = FALSE) +
  geom_text(stat = "stratum", discern = FALSE, label.strata = TRUE)
# warning when inappropriate
ggplot(majors[majors$semester %in% paste0("CURR", c(1, 7, 13)), ],
       aes(x = semester, stratum = curriculum, alluvium = student,
           label = curriculum)) +
  geom_alluvium(aes(fill = as.factor(student)), discern = TRUE) +
  geom_stratum(discern = TRUE) +
  geom_text(stat = "stratum", discern = TRUE)
# }

Run the code above in your browser using DataLab