compare_conditions: Compare two conditions within a data frame

Description

Using logic that filter can interpret, compare_conditions() will summarize the data aggregating condition x and condition y

Usage

compare_conditions(df, x, y, .cols = everything(), .fns = lst(mean))

Value

Returns a data frame that is either 1 row, or if grouped, 1 row per group.

Arguments

df: data frame
x: condition for comparison, same criteria you would use in 'dplyr::filter', used in contrast to the reference group 'y'
y: condition for comparison, same criteria you would use in 'dplyr::filter', used in contrast to the reference group 'x'
.cols: columns to use in comparison
.fns: named list of the functions to use, ex: list(avg = mean, sd = sd) 'purrr' style phrases are also supported like list(mean = ~mean(.x, na.rm = TRUE), sd = sd) and dplyr::lst(mean, sd) will create a list(mean = mean, sd = sd)

Details

compare_conditions() passes its arguments to across. The .cols and .fns work the same. For clarity, it is helpful to use the lst function for the .fns parameter. Using compare_conditions(..., .cols = my_var, .fns = lst(mean, sd)) will return the values mean_my_var_x, mean_my_var_y, sd_my_var_x and sd_my_var_x

Examples

Run this code


# compare_conditions works similar to dplyr::across()
pixar_films |>
  compare_conditions(
    x = (rating == "G"),
    y = (rating == "PG"),
    .cols = rotten_tomatoes
  )


# because data frames are just fancy lists, you pass the result to headline_list()
pixar_films |>
  compare_conditions(
    x = (rating == "G"),
    y = (rating == "PG"),
    .cols = rotten_tomatoes
  ) |>
 headline_list("a difference of {delta} points")


 # you can return multiple objects to compare
 # 'view_List()' is a helper to see list objects in a compact way
 pixar_films |>
  compare_conditions(
    x = (rating == "G"),
    y = (rating == "PG"),
    .cols = c(rotten_tomatoes, metacritic),
    .fns = dplyr::lst(mean, sd)
  ) |>
  view_list()


# you can use any of the `tidyselect` helpers
pixar_films |>
  compare_conditions(
    x = (rating == "G"),
    y = (rating == "PG"),
    .cols = dplyr::starts_with("bo_")
  )


# if you want to compare x to the overall average, use y = TRUE
pixar_films |>
  compare_conditions(
    x = (rating == "G"),
    y = TRUE,
    .cols = rotten_tomatoes
  )


# to get the # of observations use length() instead of n()
# note: don't pass the parentheses
pixar_films |>
  compare_conditions(
    x = (rating == "G"),
    y = (rating == "PG"),
    .cols = rotten_tomatoes, # can put anything here really
    .fns = list(n = length)
  )


# you can also use purrr-style lambdas
pixar_films |>
  compare_conditions(
    x = (rating == "G"),
    y = (rating == "PG"),
    .cols = rotten_tomatoes,
    .fns = list(avg = ~ sum(.x) / length(.x))
  )

# you can compare categorical data with functions like dplyr::n_distinct()
pixar_films |>
  compare_conditions(
    x = (rating == "G"),
    y = (rating == "PG"),
    .cols = film,
    .fns = list(distinct = dplyr::n_distinct)
  )

Run the code above in your browser using DataLab