generate_frequency: Generate frequency table

Description

Creates frequency tables for one or more categorical variables, optionally grouped by other variables. The function supports various enhancements such as sorting, totals, percentages, cumulative statistics, handling of missing values, and label customization. It returns a single table or a list of frequency tables.

Usage

generate_frequency(
  data,
  ...,
  sort_value = TRUE,
  sort_desc = TRUE,
  sort_except = NULL,
  add_total = TRUE,
  add_percent = TRUE,
  add_cumulative = FALSE,
  add_cumulative_percent = FALSE,
  as_proportion = FALSE,
  include_na = TRUE,
  recode_na = "auto",
  position_total = c("bottom", "top"),
  calculate_per_group = TRUE,
  group_separator = " - ",
  group_as_list = FALSE,
  label_as_group_name = TRUE,
  label_stub = NULL,
  label_na = "Not reported",
  label_total = "Total",
  expand_categories = TRUE,
  convert_factor = FALSE,
  collapse_list = FALSE,
  top_n = NULL,
  top_n_only = FALSE,
  metadata = NULL
)

Value

A frequency table (tibble, possibly nested) or a list of such tables. Additional attributes such as labels, metadata, and grouping information may be attached. The returned object is of class "tsg".

Arguments

data: A data frame (typically tibble) containing the variables to summarize.
...: One or more unquoted variable names (passed via tidy evaluation) for which to compute frequency tables.
sort_value: Logical. If TRUE, frequency values will be sorted.
sort_desc: Logical. If TRUE, sorts in descending order of frequency. If sort_value = FALSE, the category is sorted in ascending order.
sort_except: Optional character vector. Variables to exclude from sorting.
add_total: Logical. If TRUE, adds a total row or value to the frequency table.
add_percent: Logical. If TRUE, adds percent or proportion values to the table.
add_cumulative: Logical. If TRUE, adds cumulative frequency counts.
add_cumulative_percent: Logical. If TRUE, adds cumulative percentages (or proportions if as_proportion = TRUE).
as_proportion: Logical. If TRUE, displays proportions instead of percentages (range 0–1).
include_na: Logical. If TRUE, includes missing values in the frequency table.
recode_na: Character or NULL. Value used to replace missing values in labelled vectors; "auto" will determine a code automatically.
position_total: Character. Where to place the total row: "top" or "bottom".
calculate_per_group: Logical. If TRUE, calculates frequencies within groups defined in data (from group_by() or existing grouping).
group_separator: Character. Separator used when concatenating group values in list output (if group_as_list = TRUE).
group_as_list: Logical. If TRUE, output is a list of frequency tables for each group combination.
label_as_group_name: Logical. If TRUE, uses variable labels as names in the output list; otherwise, uses variable names.
label_stub: Optional character vector used for labeling output tables (e.g., for export or display).
label_na: Character. Label to use for missing (NA) values.
label_total: Character. Label used for the total row/category.
expand_categories: Logical. If TRUE, ensures all categories (including those with zero counts) are included in the output.
convert_factor: Logical. If TRUE, converts labelled variables to factors in the output. See also convert_factor().
collapse_list: Logical. If TRUE and group_as_list = TRUE, collapses the list of frequency tables into a single data frame with group identifiers. See also collapse_list().
top_n: Integer or NULL. If specified, limits the output to the top n categories by frequency.
top_n_only: Logical. If TRUE and top_n is specified, only the top n categories are included, excluding others.
metadata: A named list with optional metadata to attach as attributes, e.g. title, subtitle, and source_note.

Examples

Run this code

# Using built-in dataset `person_record`


# Basic usage
person_record |>
 generate_frequency(sex)

# Multiple variables
person_record |>
  generate_frequency(sex, age, marital_status)

# Grouping
person_record |>
  dplyr::group_by(sex) |>
  generate_frequency(marital_status)

# Output group as list
person_record |>
  dplyr::group_by(sex) |>
  generate_frequency(marital_status, group_as_list = TRUE)

# Sorting

# default is TRUE
person_record |>
  generate_frequency(age, sort_value = TRUE)

# If FALSE, the output will be sorted by the variable values in ascending order.
person_record |>
  generate_frequency(age, sort_value = FALSE)

# Vignettes for more examples.

Run the code above in your browser using DataLab

Description

Usage

Value

Arguments

See Also

Examples