Learn R Programming

StatsTFLValR (version 1.0.0)

generate_compare_report: Compare DEV vs VAL datasets (PROC COMPARE-style) with robust file detection

Description

generate_compare_report() compares a developer (DEV) dataset and a validation (VAL) dataset for a given domain and produces outputs similar to SAS PROC COMPARE.

This function is intended for ADaM/SDTM/TFL validation workflows and supports:

  • Directory-driven inputs: DEV and VAL locations are provided via dev_dir and val_dir.

  • Case-insensitive domain matching: domain = "ADAE" will match files like adae.*.

  • VAL prefix flexibility: resolves prefix_val variants such as v_, v-, and v (no separator).

  • Automatic extension detection for DEV and VAL files: .sas7bdat, .xpt, .csv, .rds.

  • Optional filtering using filter_expr prior to comparison.

  • Optional PROC COMPARE-style CSV output with BASE, COMPARE, and DIF triplets.

  • Optional LST-like report using arsenal::comparedf() for summarized differences.

Usage

generate_compare_report(
  domain,
  dev_dir,
  val_dir,
  by_vars = c("STUDYID", "USUBJID"),
  vars_to_check = NULL,
  report_dir = NULL,
  prefix_val = "v_",
  max_print = 50,
  write_csv = FALSE,
  run_comparedf = TRUE,
  filter_expr = NULL,
  study_id = NULL,
  author = NULL
)

Value

Invisibly returns a list with:

  • only_in_dev: rows present only in DEV (set-difference result)

  • only_in_val: rows present only in VAL (set-difference result)

  • comparedf: arsenal::comparedf object (or NULL if run_comparedf = FALSE)

Arguments

domain

Character scalar domain name (e.g., "adsl", "adae", "rt-ae-sum"). Matching is case-insensitive.

dev_dir

DEV dataset directory path.

val_dir

VAL dataset directory path.

by_vars

Character vector of key variables used to match records (e.g., c("STUDYID","USUBJID") or c("STUDYID","USUBJID","AESEQ")).

vars_to_check

Optional character vector of variables to compare. If NULL, compares all common variables (excluding key handling remains as per implementation).

report_dir

Output directory for report files. Created if missing.

prefix_val

Character prefix for validation datasets (default "v_"). The resolver also supports variants like v- and v (no separator).

max_print

Maximum number of lines printed in the .lst report for summaries/diffs.

write_csv

Logical; if TRUE, writes PROC COMPARE-style CSV to report_dir as compare_<domain>.csv.

run_comparedf

Logical; if TRUE, uses arsenal::comparedf() to generate a .lst report.

filter_expr

Optional filter expression string evaluated within each dataset (e.g., "SAFFL == 'Y' & TRTEMFL == 'Y'").

study_id

Optional study identifier included in the .lst header.

author

Optional author name included in the .lst header.

Details

The function looks for exactly one matching domain file per directory:

  • DEV: <domain>.<ext>

  • VAL: <prefix><domain>.<ext> where <prefix> is prefix_val plus common variants supporting underscore/hyphen/no-separator forms (e.g., v_, v-, v).

Supported extensions (priority order) are: sas7bdat, xpt, csv, rds.

If multiple matches exist for the same domain in a directory (e.g., adae.csv and adae.xpt), the function stops with an ambiguous match error to prevent accidental comparisons.

PROC COMPARE-style CSV behavior When write_csv = TRUE, the output includes:

  • _TYPE_ with values BASE, COMPARE, DIF

  • _OBS_ sequence within each BY key

  • For numeric variables, DIF = DEV - VAL

  • For Date variables, DIF is integer day difference (as.integer(DEV - VAL))

  • For POSIXct variables, DIF is seconds difference (as.numeric(DEV - VAL))

  • For other types, DIF is a character mask (X indicates difference)

See Also

comparedf, fsetdiff, fintersect

Examples

Run this code

td <- tempdir()
dev_dir <- file.path(td, "dev")
val_dir <- file.path(td, "val")
rpt_dir <- file.path(td, "rpt")
dir.create(dev_dir, showWarnings = FALSE)
dir.create(val_dir, showWarnings = FALSE)
dir.create(rpt_dir, showWarnings = FALSE)


dev <- data.frame(
  STUDYID = "STDY1",
  USUBJID = c("01", "02"),
  AESEQ   = c(1, 1),
  AETERM  = c("HEADACHE", "NAUSEA"),
  stringsAsFactors = FALSE
)
val <- dev
val$AETERM[2] <- "VOMITING"

utils::write.csv(dev, file.path(dev_dir, "adae.csv"), row.names = FALSE)
utils::write.csv(val, file.path(val_dir, "v-adae.csv"), row.names = FALSE)


generate_compare_report(
  domain        = "adae",
  dev_dir       = dev_dir,
  val_dir       = val_dir,
  by_vars       = c("STUDYID","USUBJID","AESEQ"),
  report_dir    = rpt_dir,
  write_csv     = TRUE,
  run_comparedf = FALSE
)


generate_compare_report(
  domain        = "ADAE",
  dev_dir       = dev_dir,
  val_dir       = val_dir,
  by_vars       = c("STUDYID","USUBJID","AESEQ"),
  report_dir    = rpt_dir,
  write_csv     = FALSE,
  run_comparedf = FALSE
)


generate_compare_report(
  domain        = "adae",
  dev_dir       = dev_dir,
  val_dir       = val_dir,
  by_vars       = c("STUDYID","USUBJID","AESEQ"),
  report_dir    = rpt_dir,
  filter_expr   = "USUBJID == '02'",
  write_csv     = TRUE,
  run_comparedf = FALSE
)

Run the code above in your browser using DataLab