SOCbyPT: SOC → PT summary by treatment (wide), with optional BY-grouping, SOC totals, UNCODED positioning, BY-specific Big-N, and optional Big-N printing

Description

Build a System Organ Class (SOC) → Preferred Term (PT) summary by treatment in a wide layout suitable for clinical TLFs. Optionally stratify the display by a BY variable from the AE dataset, order BY groups by a separate key, add TOTAL rows, control UNCODED placement, and optionally calculate percentages using BY-specific denominators.

Usage

SOCbyPT(
  indata,
  dmdata,
  pop_data = NULL,
  group_vars,
  trtan_coln,
  by_var = NULL,
  by_sort_var = NULL,
  by_sort_numeric = TRUE,
  id_var = "USUBJID",
  rtf_safe = TRUE,
  indent_str = "(*ESC*)R/RTF\"\\li360 \"",
  use_sas_round = FALSE,
  header_blank = FALSE,
  soc_totals = FALSE,
  total_label = "TOTAL SUBJECTS WITH AN EVENT",
  uncoded_position = c("count", "last"),
  bigN_by = NULL,
  print_bigN = FALSE
)

Value

A tibble with columns:

stat
trt* treatment columns
sort_ord, sec_ord
by_var, by_sort_var (when BY used)

Arguments

indata

AE-like input with at least: subject id, SOC, PT, and the main treatment column. If BY is used, by_var (and by_sort_var if different) must exist in indata.

dmdata

Working denominator dataset (e.g., filtered ADSL) with at least: subject id and the main treatment column. If bigN_by = "YES" and BY is used, dmdata must also contain by_var to compute BY-specific denominators.

pop_data

Master population dataset (e.g., full ADSL) used to define the set/order of treatment arms. If NULL, defaults to dmdata.

group_vars

Character vector of length 3: c(main_treatment, SOC, PT).

trtan_coln

Treatment level value (e.g., "12" or 12) that drives sorting (descending count, then alpha).

by_var

Optional BY column name (quoted or unquoted) from indata used to split the table into groups.

by_sort_var

Optional column (quoted or unquoted) used to order BY groups. Defaults to by_var.

by_sort_numeric

If TRUE, BY groups ordered by as.numeric(by_sort_var); else lexicographic.

id_var

Subject identifier column name. Default "USUBJID".

rtf_safe

If TRUE, PT labels are prefixed by indent_str. Default TRUE.

indent_str

Prefix added to PT labels when rtf_safe = TRUE.

use_sas_round

If TRUE, use SAS-style rounding (ties away from zero). Default FALSE.

header_blank

If TRUE, blank treatment cells on SOC header rows (TOTAL rows remain populated). Default FALSE.

soc_totals

If TRUE, SOC header rows are retained/populated (default behavior). Included for API parity.

total_label

Label for TOTAL row(s). Default "TOTAL SUBJECTS WITH AN EVENT".

uncoded_position

Where to place UNCODED: "count" (default behavior by counts) or "last" (push to bottom).

bigN_by

Flag controlling denominator behavior when BY is used:

NULL / "NO" (default): denominators are by treatment only (not stratified by BY)
"YES": denominators are by BY × treatment (requires by_var in dmdata)

print_bigN

If TRUE, prints denominators (Big-N) used for percent calculations to console/log.

Examples

Run this code


library(dplyr)


adae <- tibble::tribble(
  ~USUBJID, ~TRTAN, ~AEBODSYS,          ~AEDECOD,
  "01",       11,   "GASTROINTESTINAL", "NAUSEA",
  "01",       11,   "GASTROINTESTINAL", "VOMITING",
  "02",       11,   "NERVOUS SYSTEM",   "HEADACHE",
  "03",       12,   "GASTROINTESTINAL", "NAUSEA",
  "04",       12,   "NERVOUS SYSTEM",   "DIZZINESS",
  "05",       12,   "UNCODED",          "UNCODED"
)

adsl <- tibble::tribble(
  ~USUBJID, ~TRTAN,
  "01",       11,
  "02",       11,
  "03",       12,
  "04",       12,
  "05",       12
)

out1 <- SOCbyPT(
  indata     = adae,
  dmdata     = adsl,
  group_vars = c("TRTAN", "AEBODSYS", "AEDECOD"),
  trtan_coln = "12"   # reference arm for sorting
)

out1




out2 <- SOCbyPT(
  indata       = adae,
  dmdata       = adsl,
  group_vars   = c("TRTAN", "AEBODSYS", "AEDECOD"),
  trtan_coln   = "12",
  rtf_safe     = FALSE,
  header_blank = TRUE
)

out2



adae_sex <- tibble::tribble(
  ~USUBJID, ~TRTAN, ~SEX, ~AEBODSYS,          ~AEDECOD,
  "01",       11,   "M",  "GASTROINTESTINAL", "NAUSEA",
  "02",       11,   "F",  "GASTROINTESTINAL", "VOMITING",
  "03",       12,   "M",  "NERVOUS SYSTEM",   "HEADACHE",
  "04",       12,   "F",  "NERVOUS SYSTEM",   "DIZZINESS",
  "05",       12,   "F",  "UNCODED",          "UNCODED"
)

adsl_sex <- tibble::tribble(
  ~USUBJID, ~TRTAN, ~SEX,
  "01",       11,   "M",
  "02",       11,   "F",
  "03",       12,   "M",
  "04",       12,   "F",
  "05",       12,   "F"
)

out3 <- SOCbyPT(
  indata           = adae_sex,
  dmdata           = adsl_sex,
  group_vars       = c("TRTAN", "AEBODSYS", "AEDECOD"),
  trtan_coln       = "12",
  by_var           = "SEX",
  by_sort_var      = "SEX",
  by_sort_numeric  = FALSE,
  uncoded_position = "last"
)

out3



out4 <- SOCbyPT(
  indata      = adae_sex,
  dmdata      = adsl_sex,
  group_vars  = c("TRTAN", "AEBODSYS", "AEDECOD"),
  trtan_coln  = "12",
  by_var      = "SEX",
  bigN_by     = "YES",
  print_bigN  = TRUE
)

out4


out4_trtN <- SOCbyPT(
  indata     = adae_sex,
  dmdata     = adsl_sex,
  group_vars = c("TRTAN", "AEBODSYS", "AEDECOD"),
  trtan_coln = "12",
  by_var     = "SEX",
  bigN_by    = "NO",
  print_bigN = TRUE
)

out4_trtN



pop_adsl <- tibble::tribble(
  ~USUBJID, ~TRTAN,
  "01",       11,
  "02",       11,
  "03",       12,
  "04",       12,
  "05",       13
)

out5 <- SOCbyPT(
  indata     = adae,
  dmdata     = adsl,
  pop_data   = pop_adsl,
  group_vars = c("TRTAN", "AEBODSYS", "AEDECOD"),
  trtan_coln = "12"
)

Run the code above in your browser using DataLab