correlation_similarity

For each pair of numerical columns, computes <code>1 - |corr_real - corr_syn| / 2</code>
(the SDMetrics <code>CorrelationSimilarity</code> score), where <code>corr</code> is the Pearson
correlation. Returns one row per pair plus the mean.

Generates synthetic tabular data from real datasets using
Gaussian copula models, with parametric marginal selection for
numerical columns and a cumulative-frequency embedding that brings
categorical and boolean columns into the same joint copula. Includes
a metadata system with column types and primary keys, declarative
constraints enforced via rejection sampling, conditional sampling,
and quality, validity and privacy reports modeled on those of the
'SDMetrics' library. Inspired by the Python 'SDV' (Synthetic Data
Vault) library by 'DataCebo'; see Patki, Wedge and Veeramachaneni
(2016) "The Synthetic Data Vault" <doi:10.1109/DSAA.2016.49>.

Kailas Venkitasubramanian

rsdv

Synthetic Tabular Data Generation with Gaussian Copulas

correlation_similarity function

<dl><dt>real</dt>
<dd>A data frame of real data.</dd>
<dt>synthetic</dt>
<dd>A data frame of synthetic data.</dd>
<dt>meta</dt>
<dd>An <code>rsdv_metadata</code> object.</dd></dl>

Arguments

Correlation similarity between real and synthetic numerical column pairs — correlation_similarity

<dl>

<dt>real</dt>
<dd>A data frame of real data.</dd>


<dt>synthetic</dt>
<dd>A data frame of synthetic data.</dd>


<dt>meta</dt>
<dd>An <code>rsdv_metadata</code> object.</dd>

</dl>

correlation_similarity: Correlation similarity between real and synthetic numerical column pairs

Description

Usage

Value

Arguments

Examples