Generate Fake Data from Real Dataset Structure
generate_fake_data(
data,
n = 30,
category_mode = c("preserve", "generic", "custom"),
numeric_mode = c("range", "distribution"),
column_mode = c("keep", "generic", "custom"),
custom_levels = NULL,
custom_names = NULL,
seed = NULL,
verbose = FALSE,
sensitive = NULL,
sensitive_detect = TRUE,
sensitive_strategy = c("fake", "drop"),
normalize = TRUE
)A data.frame of n rows with attributes:
name_map (named chr: original -> output)
column_mode (chr)
sensitive_columns (chr; original names)
dropped_columns (chr; original names that were dropped)
A tabular object; will be coerced via prepare_input_data().
Rows to generate (default 30).
One of "preserve","generic","custom".
preserve: sample observed categories by empirical frequency (keeps factors)
generic: replace categories with "Category A/B/..."
custom: use custom_levels[[colname]] if provided
One of "range","distribution".
range: uniform between min/max (integers stay integer-like)
distribution: sample observed values with replacement
One of "keep","generic","custom".
keep: keep original column names
var1..varP (mapping in attr(name_map))
custom: use custom_names named vector (old -> new)
optional named list of allowed levels per column (for
optional named character vector old->new (for
column_mode="custom").
Optional RNG seed.
Logical; print progress.
Optional character vector of original column names to treat as sensitive.
Logical; auto-detect common sensitive columns by name.
One of "fake","drop". Only applied if any sensitive columns exist.
Logical; lightly normalize inputs (trim, %→numeric, short date-times→POSIXct).