Learn R Programming

LightLogR (version 0.10.0)

sample_groups: Sample groups from a grouped dataset

Description

This helper selects a subset of groups from a grouped dataset. Groups can be drawn randomly, by ordering groups from the top or bottom according to a summary expression, or by filtering with a custom condition. The function is designed to work with datasets that were grouped using dplyr::group_by().

Usage

sample_groups(
  dataset,
  n = 1,
  sample = c("top", "bottom", "random"),
  order.by = dplyr::cur_group_id(),
  condition = NULL
)

Value

A grouped tibble containing only the sampled groups.

Arguments

dataset

A grouped dataset. Expects a data frame grouped with dplyr::group_by().

n

Number of groups to return. Defaults to 1. Ignored when condition is supplied and n is NULL.

sample

Sampling strategy. Must be one of "random", "top" (the default), or "bottom". Alternatively, a numeric vector can be provided to select group positions (using bottom ordering); when numeric, n is ignored. When condition is provided, the sample value is ignored and conditional filtering is applied instead.

order.by

Expression used to order groups when sample is set to "top" or "bottom". Evaluated in a one-row summary for each group. Defaults to dplyr::cur_group_id(), i.e., the group number.

condition

Logical expression used to filter the summarised groups. Evaluated in a one-row summary for each group, which includes an .order_value column derived from order.by.

Examples

Run this code
#gives one last group (highest group id)
sample.data.environment |>
  sample_groups() |>
  dplyr::group_keys()

#gives one random group (highest group id)
sample.data.environment |>
  sample_groups(sample = "random") |>
  dplyr::group_keys()

#gives the group with the highest average melanopic EDI
sample.data.environment |>
  sample_groups(order.by = mean(MEDI)) |>
  dplyr::group_keys()

#gives the group with the lowest average melanopic EDI
sample.data.environment |>
  sample_groups(sample = "bottom", order.by = mean(MEDI)) |>
  dplyr::group_keys()

# give only groups that have a median melanopic EDI > 1000 lx
sample.data.environment |>
  sample_groups(condition = median(MEDI, na.rm = TRUE) > 1000) |>
  dplyr::group_keys()

# return only days with time above 250 lx mel EDI > 7 hours
sample.data.environment |>
  add_Date_col(group.by = TRUE) |>
  sample_groups(order.by = duration_above_threshold(MEDI, Datetime, threshold = 250),
                condition = .order_value > 7*60*60) |>
  dplyr::group_keys()
  
# return the 5 days with the highest time above 250 lx mel EDI
sample.data.environment |>
  add_Date_col(group.by = TRUE) |>
  sample_groups(
    n = 5,
    order.by = duration_above_threshold(MEDI, Datetime, threshold = 250),
    ) |>
  dplyr::group_keys()

# gives the first group
sample.data.environment |>
  sample_groups(sample = 1) |>
  dplyr::group_keys()

# gives the second group
sample.data.environment |>
  sample_groups(sample = 2) |>
  dplyr::group_keys()
  

Run the code above in your browser using DataLab