Learn R Programming

rsdv (version 0.2.0)

quality_report: Generate a quality report comparing real and synthetic data

Description

Aggregates metrics into the two-property hierarchy used by SDMetrics:

Usage

quality_report(real, synthetic, metadata, target_col = NULL)

Value

An rsdv_quality_report object.

Arguments

real

A data frame of real data.

synthetic

A data frame of synthetic data.

metadata

An rsdv_metadata object.

target_col

Optional. Name of a categorical column for ML efficacy. Reported alongside the score but excluded from the overall.

Details

  • Column Shapes — per-column marginal fidelity: KS similarity for numerical columns and TVD similarity for categorical columns.

  • Column Pair Trends — pairwise dependence: correlation similarity for numerical pairs and contingency similarity for categorical pairs.

The overall score is the mean of the two property scores, so a table with many categorical columns and few numerical ones is not weighted by raw column counts. ML efficacy, when requested, is reported separately and does not enter the overall score (matching SDMetrics).

Examples

Run this code
# \donttest{
meta  <- metadata(adult_income) |>
  set_column_type("age", "numerical") |>
  set_column_type("occupation", "categorical")
syn   <- gaussian_copula_synthesizer(meta) |> fit(adult_income)
synth <- sample(syn, n = 500)
qr    <- quality_report(adult_income, synth, meta)
print(qr)
# }

Run the code above in your browser using DataLab