Learn R Programming

clinCompare (version 1.0.0)

detect_cdisc_domain: Detect CDISC Domain Type

Description

Detects whether a data frame looks like an SDTM domain or ADaM dataset by comparing column names against known CDISC standards. Calculates a confidence score based on the percentage of expected variables present.

Auto-detection is a convenience for exploratory use. For anything important -- validation reports, regulatory submissions, scripted pipelines -- always pass domain and standard explicitly. Datasets with common columns (STUDYID, USUBJID, etc.) can match multiple domains, and a warning is issued when the top two candidates score within 10 percentage points of each other.

Usage

detect_cdisc_domain(df, name_hint = NULL)

Value

A list containing:

standard

Character: "SDTM", "ADaM", or "Unknown"

domain

Character: domain code (e.g., "DM", "AE") or dataset name (e.g., "ADSL"), or NA

confidence

Numeric between 0 and 1 indicating match quality

message

Character: human-readable explanation

Arguments

df

A data frame to analyze.

name_hint

Optional character string with the dataset name (e.g., "DM", "ADLB", or a filename like "adlb.xpt"). When provided and it matches a known CDISC domain, that candidate receives a strong confidence boost. This makes detection much more accurate when the filename is available.

Examples

Run this code
# \donttest{
# Create a sample SDTM DM domain
dm <- data.frame(
  STUDYID = "STUDY001",
  USUBJID = "SUBJ001",
  SUBJID = "001",
  DMSEQ = 1,
  RACE = "WHITE",
  ETHNIC = "NOT HISPANIC OR LATINO",
  ARMCD = "ARM01",
  ARM = "Treatment A",
  stringsAsFactors = FALSE
)

result <- detect_cdisc_domain(dm)
print(result)
# }

Run the code above in your browser using DataLab