concordance_check: Perform Concordance Check on Data Based on Defined Rules

Description

This function evaluates a source dataframe (`S_data`) against a set of rules defined in a metadata dataframe (`M_data`). It Checks multiple concordance rules (logical and clinical conditions) on columns of a data frame, based on metadata specifications. Supports flexible rule definition, date handling, and customizable output.

Usage

concordance_check(
  S_data,
  M_data,
  Result = FALSE,
  show_column = NULL,
  date_parser_fun = smart_to_gregorian_vec,
  var_select = "all",
  verbose = FALSE
)

Value

If Result = FALSE: a data.frame summary with columns:

VARIABLE: Name of the variable/rule.
Condition_Met: Number of rows where the rule is TRUE.
Condition_Not_Met: Number of rows where the rule is FALSE.
NA_Count: Number of rows with missing/indeterminate result.
Total_Applicable: Number of non-NA rows.
Total_Rows: Number of total rows.
Percent_Met: Percentage of applicable rows meeting the condition.
Percent_Not_Met: Percentage of applicable rows not meeting the condition.
Concordance_Error_Type: Error type from metadata (if available).

Arguments

S_data: data.frame. The source data in which rules will be evaluated. Each column may be referenced by the rules.
M_data: data.frame. Metadata describing variables and their concordance rules. Must include at least columns VARIABLE and Concordance_Rule. Optionally includes TYPE and Concordance_Error_Type.
Result: logical (default: FALSE). If TRUE, returns row-by-row evaluation results for each rule. If FALSE, returns a summary table for each rule.
show_column: character vector (default: NULL). Names of columns from S_data to include in the result when Result = TRUE. Ignored otherwise.
date_parser_fun: function (default: smart_to_gregorian_vec). Converting Persian dates to English,Function to convert date values or date literals to Date class. Must accept character vectors and return Date objects.
var_select: character, numeric, or "all" (default: "all"). Subset of variables (rules) to check. Can be a character vector of variable names, numeric vector of row indices in M_data, or "all" to run all rules.
verbose: logical (default: FALSE). If TRUE, prints diagnostic messages during rule processing and evaluation.

Details

The metadata data.frame (M_data) must contain at least the following columns:

VARIABLE: The name of the variable in S_data to which the rule applies.
Concordance_Rule: The logical or clinical rule (as a string) to be evaluated for each row.
TYPE: The expected type of the variable (e.g., "numeric", "date", "character").
Concordance_Error_Type: The error type for each rule will be reported in the summary output.Based on the importance and severity of the rule, it can include two options: "Warning" or "Error".

For each variable described in M_data, the function:

Replaces any instance of the string "val" in the rule with the actual column name of the variable.
Parses and detects any date literals in the rule and substitutes them with placeholders; these placeholders are converted to Date class using the provided date_parser_fun.
Automatically converts any referenced data columns to the appropriate type (numeric, date, or character) based on the TYPE column in the metadata.
Detects which columns from S_data are referenced in each rule and ensures they are available and correctly typed before evaluation.
Evaluates the rule for each row of S_data, using vectorized evaluation for performance where possible, and falling back to row-wise evaluation if necessary (e.g., for rules that are not vectorizable, such as those using ifelse with NA logic).

The function supports flexible rule definitions, including conditions involving multiple columns,clinical rules, date comparisons, and custom logic using R expressions.

If Result = FALSE, the function returns a summary table for each rule, including counts and percentages of rows that meet or do not meet the condition, as well as the error type from the metadata.

If Result = TRUE, the function returns a data.frame with one column per rule/variable, each containing logical values (TRUE, FALSE, or NA) for every row, plus any extra columns from S_data listed in show_column.

Examples

Run this code

# build the long rule in multiple short source lines to avoid >100 char Rd lines
rule_bp <- paste0(
  "(ifelse(is.na(val) | is.na(Systolic_BP2), NA, ",
  "(abs(val - Systolic_BP2) >= 15) & (val > 15 & Prescription_drug == '')))"
)

# Source data
S_data <- data.frame(
  National_code = c("123", "1456", "789","545","4454","554"),
  LastName = c("Aliyar","Johnson","Williams","Brown","Jones","Garcia"),
  VisitDate = c("2025-09-23", "2021-01-10", "2021-01-03","1404-06-28","1404-07-28",NA),
  Test_date = c("1404-07-01", "2021-01-09", "2021-01-14","1404-06-29","2025-09-19",NA),
  Certificate_validity = c("2025-09-21", "2025-01-12", "2025-02-11","1403-06-28","2025-09-19",NA),
  Systolic_BP1 = c(110, NA, 145, 125,114,NA),
  Systolic_BP2 = c(125, 150, NA, 110,100,NA),
  Prescription_drug= c("Atorvastatin", "Metformin", "Amlodipine",
    "Omeprazole", "Aspirin","Metoprolol"),
  Blood_type = c("A-","B+","AB","A+","O-","O+"),
  stringsAsFactors = FALSE
)

# META DATA (use the short-built rule)
M_data <- data.frame(
  VARIABLE = c("National_code", "Certificate_validity", "VisitDate",
               "Test_date","LastName","Systolic_BP1","Systolic_BP2",
               "Prescription_drug","Blood_type"),
  Concordance_Rule = c(
    "", "", "VisitDate<=Test_date", "Test_date-VisitDate < 7", "",
    rule_bp, "", "", ""
  ),
  TYPE=c("numeric","date","date","date","character",
         "numeric","numeric","character","character"),
  Concordance_Error_Type = c("type1",NA,"type2","type3",NA,NA,NA,NA,"type4"),
  stringsAsFactors = FALSE
)

# test call
result <- concordance_check(S_data = S_data, M_data = M_data, Result = TRUE,
show_column = c("National_code"))
print(result)

Run the code above in your browser using DataLab