generate_fake_with_privacy

Generates a synthetic copy of <code>data</code>, then optionally detects/handles
sensitive columns by name. Detection uses the ORIGINAL column names and
maps to output via <code>attr(fake, "name_map")</code> if present.

Generate privacy-preserving synthetic datasets that mirror structure, types, factor levels, and missingness; export bundles for 'LLM' workflows (data plus 'JSON' schema and guidance); and build fake data directly from 'SQL' database tables without reading real rows. Methods are related to approaches in Nowok, Raab and Dibben (2016) <doi:10.32614/RJ-2016-019> and the foundation-model overview by Bommasani et al. (2021) <doi:10.48550/arXiv.2108.07258>.

Zobaer Ahmed

FakeDataR

Privacy-Preserving Synthetic Data for 'LLM' Workflows

generate_fake_with_privacy function

<dl><dt>data</dt>
<dd>A data.frame (or coercible) to mirror.</dd>
<dt>n</dt>
<dd>Rows to generate (default same as input if NULL).</dd>
<dt>level</dt>
<dd>One of "low","medium","high".</dd>
<dt>seed</dt>
<dd>Optional RNG seed.</dd>
<dt>sensitive</dt>
<dd>Character vector of original column names to treat as sensitive.</dd>
<dt>sensitive_detect</dt>
<dd>Logical; auto-detect common sensitive columns by name.</dd>
<dt>sensitive_strategy</dt>
<dd>One of "fake" or "drop".</dd>
<dt>normalize</dt>
<dd>Logical; lightly normalize inputs.</dd>
<dt>sensitive_patterns</dt>
<dd>Optional named list of patterns to treat as sensitive
(e.g., list(id = "...", email = "...", phone = "...")). Overrides defaults.</dd>
<dt>sensitive_regex</dt>
<dd>Optional fully-combined regex (single string) to detect
sensitive columns by name. If supplied, it is used instead of defaults.</dd></dl>

Arguments

Generate fake data with privacy controls — generate_fake_with_privacy

<dl>

<dt>data</dt>
<dd>A data.frame (or coercible) to mirror.</dd>


<dt>n</dt>
<dd>Rows to generate (default same as input if NULL).</dd>


<dt>level</dt>
<dd>One of "low","medium","high".</dd>


<dt>seed</dt>
<dd>Optional RNG seed.</dd>


<dt>sensitive</dt>
<dd>Character vector of original column names to treat as sensitive.</dd>


<dt>sensitive_detect</dt>
<dd>Logical; auto-detect common sensitive columns by name.</dd>


<dt>sensitive_strategy</dt>
<dd>One of "fake" or "drop".</dd>


<dt>normalize</dt>
<dd>Logical; lightly normalize inputs.</dd>


<dt>sensitive_patterns</dt>
<dd>Optional named list of patterns to treat as sensitive
(e.g., list(id = "...", email = "...", phone = "...")). Overrides defaults.</dd>


<dt>sensitive_regex</dt>
<dd>Optional fully-combined regex (single string) to detect
sensitive columns by name. If supplied, it is used instead of defaults.</dd>

</dl>

generate_fake_with_privacy: Generate fake data with privacy controls

Description

Usage

Value

Arguments

Details