Learn R Programming

StatsTFLValR (version 1.0.0)

freq_by: Frequency Table by Group (wide): n (%) with flexible ordering and formats

Description

freq_by() produces a one-level frequency table by treatment (wide layout) where each row is a category of last_group (e.g., a bucketed lab value), and each treatment column shows n (%) using distinct subject counts.

New: If fmt is not provided (NULL), labels are derived from the unique values present in data[[last_group]] (post na_to_code mapping, if used).

It supports:

  • SAS-style rounding (use_sas_round = TRUE) for the percent.

  • Format mapping via either a named vector or a tibble/data.frame with columns value (codes) and raw (labels).

  • Ordering by the numeric value of last_group found in the data, or optionally the union of format + data codes (include_all_fmt_levels).

  • Counting NA under a chosen code/label using na_to_code (e.g., code "4" = "MISSING").

  • Auto-detecting the subject ID column when id_var is not provided.

Usage

freq_by(
  data,
  denom_data = NULL,
  main_group,
  last_group,
  label,
  sec_ord,
  fmt = NULL,
  use_sas_round = FALSE,
  indent = 2,
  id_var = "USUBJID",
  include_all_fmt_levels = TRUE,
  na_to_code = NULL
)

Value

A tibble with:

  • stat (character), sort_ord (integer), sec_ord (integer),

  • One column per treatment arm (e.g., trt1, trt2, …), with "n (pct)" or "0".

Arguments

data

A data frame containing at least main_group, last_group, and an ID column.

denom_data

Optional data frame used to derive denominators (N per treatment). Defaults to data.

main_group

Character scalar. The treatment or grouping variable name (columns in output), e.g., "TRTAN".

last_group

Character scalar. The categorical code variable to tabulate (rows). Numeric or character are both accepted; converted to character for display/ordering.

label

Character scalar. A header row displayed on top (unindented).

sec_ord

Integer scalar carried through for downstream table sorting.

fmt

Optional. Either:

  • a named character vector like c("1"="<1","2"="1-<4",...) (names = codes, values = labels), or

  • a data.frame/tibble with columns value (codes) and raw (labels), or

  • a string naming an object (in parent frame) that resolves to either of the above. If NULL (default), labels are derived from unique values of data[[last_group]].

use_sas_round

Logical; if TRUE, percent is rounded with SAS-compatible “round halves away from zero” via sas_round(). Default FALSE.

indent

Integer number of leading spaces applied to all category rows (the first label row is not indented). Default 2.

id_var

Character; the subject identifier column. If not found in data, the function tries common alternatives (e.g., USUBJID, SUBJID, etc.).

include_all_fmt_levels

Logical; if TRUE (default), the row order is built from the union of format codes and data codes (numeric sort). When fmt = NULL, this effectively reduces to observed data codes only.

na_to_code

Optional character scalar (e.g., "4"). If supplied, NA values in last_group are counted under that code before tabulation.

Details

  • Counting uses n_distinct(id_var) within each (main_group, last_group) cell.

  • Percent is 100 * n / N where N = distinct subjects in denom_data by main_group.

  • When fmt = NULL, both codes and labels are taken from the observed values of last_group (after applying na_to_code mapping), ordered numerically where possible.

  • Output treatment columns are normalized to trtXX if original names start with digits.

  • Missing treatment arms are added as "0".

Examples

Run this code
set.seed(1)

toy_adsl <- tibble::tibble(
  USUBJID = sprintf("ID%03d", 1:60),
  TRTAN   = sample(c(1, 2), size = 60, replace = TRUE),
  AGE     = sample(18:85, size = 60, replace = TRUE),
  SEX     = sample(c("Male", "Female"), size = 60, replace = TRUE),
  ETHNIC  = sample(
    c("Hispanic or Latino",
      "Not Hispanic or Latino",
      "Unknown",
      NA_character_),
    size = 60, replace = TRUE
  )
) |>
  dplyr::mutate(
    AGEGR1 = dplyr::case_when(
      AGE < 65            ~ "<65 years",
      AGE >= 65 & AGE < 75 ~ "65–<75 years",
      AGE >= 75           ~ ">=75 years"
    )
  )

toy_dm <- toy_adsl |>
  dplyr::select(USUBJID, TRTAN)

freq_by(
  data       = toy_adsl,
  denom_data = toy_dm,
  main_group = "TRTAN",
  last_group = "AGEGR1",
  label      = "Age group, n (%)",
  sec_ord    = 1,
  fmt        = NULL,
  na_to_code = NULL
)

freq_by(
  data       = toy_adsl,
  denom_data = toy_dm,
  main_group = "TRTAN",
  last_group = "SEX",
  label      = "Sex, n (%)",
  sec_ord    = 2,
  fmt        = NULL,
  na_to_code = "99"
)

fmt_ethnic <- c(
  "Hispanic or Latino"         = "Hispanic or Latino",
  "Not Hispanic or Latino"     = "Not Hispanic or Latino",
  "Unknown"                    = "Unknown",
  "99"                         = "Missing"
)

freq_by(
  data       = toy_adsl,
  denom_data = toy_dm,
  main_group = "TRTAN",
  last_group = "ETHNIC",
  label      = "Ethnic group, n (%)",
  sec_ord    = 3,
  fmt        = fmt_ethnic,
  include_all_fmt_levels = TRUE,
  na_to_code = "99"
)

Run the code above in your browser using DataLab