Learn R Programming

rsdv (version 0.2.0)

correlation_similarity: Correlation similarity between real and synthetic numerical column pairs

Description

For each pair of numerical columns, computes 1 - |corr_real - corr_syn| / 2 (the SDMetrics CorrelationSimilarity score), where corr is the Pearson correlation. Returns one row per pair plus the mean.

Usage

correlation_similarity(real, synthetic, meta)

Value

A list with pairs (a tibble of column_1, column_2, score) and score (the mean over pairs). score is NA_real_ when there are fewer than two numerical columns — there is no dependence to measure, so propagating NA (rather than 1) avoids overstating fidelity in the aggregated quality report.

Arguments

real

A data frame of real data.

synthetic

A data frame of synthetic data.

meta

An rsdv_metadata object.

Examples

Run this code
# \donttest{
syn       <- gaussian_copula_synthesizer(metadata(adult_income)) |> fit(adult_income)
synth_data <- sample(syn, n = 500)
correlation_similarity(adult_income, synth_data, metadata(adult_income))
# }

Run the code above in your browser using DataLab