compare_levels: Compare the value of some variable extracted from a Bayesian posterior sample for different levels of a factor

Description

Given a posterior sample from a Bayesian sampler in long format (e.g. as returned by spread_samples), compare the value of a variable in that sample across different paired combinations of levels of a factor.

Usage

compare_levels(samples, variable, by, fun = `-`, comparison = default,
  indices = c(".chain", ".iteration"))

Arguments

samples

Long-format data.frame of samples such as returned by spread_samples or gather_samples.

variable

Bare (unquoted) name of a column in samples representing the variable to compare across levels.

Bare (unquoted) name of a column in samples that is a factor or ordered. The value of variable will be compared across pairs of levels of this factor.

fun

Binary function to use for comparison. For each pair of levels of by we are comparing (as determined by comparison), compute the result of this function.

comparison

One of (a) the comparison types ordered, control, pairwise, or default (may also be given as strings, e.g. "ordered"), see `Details`; (b) a user-specified function that takes a factor and returns a list of pairs of names of levels to compare (as strings) and/or unevaluated expressions containing representing the comparisons to make; or (c) a list of pairs of names of levels to compare (as strings) and/or unevaluated expressions representing the comparisons to make, e.g.: list(c("a", "b"), c("b", "c")) or .(a - b, b - c), both of which would compare level "a" against "b" and level "b" against "c". Note that the unevaluated expression syntax ignores the fun argument, can include any other functions desired (e.g. variable transformations), and can even include more than two levels or other columns in samples.

indices

Character vector of column names in samples that should be treated as indices when making the comparison (i.e. values of variable within each level of by will be compared at each unique combination of levels of indices). Columns in indices not found in samples are ignored. The default is c(".chain",".iteration"), which are the same names used for chain/iteration indices variables returned by spread_samples or gather_samples; thus if you are using compare_levels with spread_samples or gather_samples you generally should not need to change this value.

Value

A data.frame with the same columns as samples, except that the by column contains a symbolic representation of the comparison of pairs of levels of by in samples, and variable contains the result of that comparison.

Details

This function simplifies conducting comparisons across levels of some variable returned from a Bayesian sample. It applies fun to all samples of variable for each pair of levels of by as selected by comparison. By default, all pairwise comparisons are generated if by is an unordered factor and ordered comparisons are made if by is ordered.

The included comparison types are:

ordered: compare each level i with level i - 1; e.g. fun(i, i - 1)
pairwise: compare each level of by with every other level.
control: compare each level of by with the first level of by. If you wish to compare with a different level, you can first apply relevel to by to set the control (reference) level.
default: use ordered if is.ordered(by) and pairwise otherwise.

Examples

Run this code

# NOT RUN {
library(dplyr)
library(ggplot2)

data(RankCorr, package = "tidybayes")

# Let's do all pairwise comparisons of b[i,1] for i in 1:3:
RankCorr %>%
  spread_samples(b[i,j]) %>%
  filter(i %in% 1:3, j == 1) %>%
  compare_levels(b, by = i) %>%
  median_qi()

# Or let's plot all comparisons against the first level (control):
RankCorr %>%
  spread_samples(b[i,j]) %>%
  filter(j == 1) %>%
  compare_levels(b, by = i, comparison = control) %>%
  ggplot(aes(x = b, y = i)) +
  geom_halfeyeh()

# }