Learn R Programming

DQA (version 0.1.0)

conformance_check: Perform Conformance Check on Data Based on Defined Rules

Description

This function evaluates a source dataframe (`S_data`) against a set of rules defined in a metadata dataframe (`M_data`). It uses a set of default rule functions but can also use a user-provided file.

Usage

conformance_check(
  S_data,
  M_data,
  rule_file = NULL,
  na_as_error = FALSE,
  var_select = "all"
)

Value

A dataframe containing the results of the conformance check for each rule.

Arguments

S_data

A dataframe containing the source data to be checked.

M_data

A metadata dataframe that specifies the rules. It must contain the columns `VARIABLE`, `Conformance_Rule`, and `Value`.

rule_file

The path to a custom R file where rule functions are defined. If `NULL` (default), the standard rule definitions file included with the`DQA` package will be used. Instructions for using this file are available under the name `conformance_rules`.

na_as_error

A logical value. If `TRUE`, `NA` values in the source data are treated as errors (non-conformant). If `FALSE` (default), they are ignored.

var_select

Character or integer vector of variables to check. Accepts variable names, column numbers, or a mix. Default is "all" (check all variables in M_data).

Details

The metadata (`M_data`) for conformance_check must include:

  • **VARIABLE:** The name of the column in `S_data` to which the rule applies.

  • **Conformance_Rule:** The name of the rule function to execute for the VARIABLE (must be defined in the rule file).

  • **Value:** Rule parameters such as Allowed length of values,, allowed category values, or column names required for computational checks.

Examples

Run this code
# 1. Create sample source data (S_data)
S_data <- data.frame(
  id = 1:10,
  national_id = c("1234567890", "0987654321", "123", NA, "1112223334",
                  "1234567890", "5556667778", "9998887770", "12345", "4445556667"),
  gender = c(1, 2, 1, 3, 2, 1, NA, 2, 1, 2), # 1=Male, 2=Female, 3=Error
  age = c(25, 40, 150, 33, -5, 65, 45, 29, 70, 55),
  part_a = c(10, 15, 20, 25, 30, 35, 40, 45, 50, 55),
  part_b = c(5, 10, 15, 20, 25, 30, 35, 40, 45, 50),
  total_parts = c(15, 25, 35, 45, 55, 65, 75, 85, 94, 105), # one error in row 9
  stringsAsFactors = FALSE
)

# 2. Create sample metadata (M_data)
M_data <- data.frame(
  VARIABLE = c(
    "national_id",
    "national_id",
    "gender",
    "total_parts"
  ),
  Conformance_Rule = c(
    "length_check",
    "unique_check",
    "category_check",
    "arithmetic_check"
  ),
  Value = c(
    "10",                  # national_id length must be 10
    "",                    # unique
    "1 | 2",               # Allowed values for gender
    "part_a + part_b"      # Computational rule for total_parts
  ),
  stringsAsFactors = FALSE
)

# 3. Run the conformance check using the package's default rules
# Ensure the 'DQA' package is loaded before running
 conformance_results <- conformance_check(S_data = S_data, M_data = M_data)
 print(conformance_results)

Run the code above in your browser using DataLab