nndr

For each synthetic row, computes the ratio of its distance to the nearest
real row vs. its distance to the second-nearest real row. A high ratio
(close to 1) means the synthetic row is not unusually close to any
specific real row — low disclosure risk. Score = mean(ratio &gt; 0.5).

Generates synthetic tabular data from real datasets using
Gaussian copula models, with parametric marginal selection for
numerical columns and a cumulative-frequency embedding that brings
categorical and boolean columns into the same joint copula. Includes
a metadata system with column types and primary keys, declarative
constraints enforced via rejection sampling, conditional sampling,
and quality, validity and privacy reports modeled on those of the
'SDMetrics' library. Inspired by the Python 'SDV' (Synthetic Data
Vault) library by 'DataCebo'; see Patki, Wedge and Veeramachaneni
(2016) "The Synthetic Data Vault" <doi:10.1109/DSAA.2016.49>.

Kailas Venkitasubramanian

rsdv

Synthetic Tabular Data Generation with Gaussian Copulas

nndr function

<dl><dt>real, synthetic</dt>
<dd>Data frames; only numerical columns are used.</dd>
<dt>normalize</dt>
<dd>Logical. When <code>TRUE</code> (default), columns are z-scored using
the real-data mean and standard deviation before distance computation.
Constant columns in <code>real</code> are dropped to avoid division by zero.</dd></dl>

Arguments

Nearest-Neighbor Distance Ratio privacy score — nndr

<dl>

<dt>real, synthetic</dt>
<dd>Data frames; only numerical columns are used.</dd>


<dt>normalize</dt>
<dd>Logical. When <code>TRUE</code> (default), columns are z-scored using
the real-data mean and standard deviation before distance computation.
Constant columns in <code>real</code> are dropped to avoid division by zero.</dd>

</dl>

nndr: Nearest-Neighbor Distance Ratio privacy score

Description

Usage

Value

Arguments

Details

Examples