Learn R Programming

⚠️There's a newer version (1.9-2) of this package.Take me there.

synthpop (version 1.8-0)

Generating Synthetic Versions of Sensitive Microdata for Statistical Disclosure Control

Description

A tool for producing synthetic versions of microdata containing confidential information so that they are safe to be released to users for exploratory analysis. The key objective of generating synthetic data is to replace sensitive original values with synthetic ones causing minimal distortion of the statistical information contained in the data set. Variables, which can be categorical or continuous, are synthesised one-by-one using sequential modelling. Replacements are generated by drawing from conditional distributions fitted to the original data using parametric or classification and regression trees models. Data are synthesised via the function syn() which can be largely automated, if default settings are used, or with methods defined by the user. Optional parameters can be used to influence the disclosure risk and the analytical quality of the synthesised data. For a description of the implemented method see Nowok, Raab and Dibben (2016) .

Copy Link

Version

Install

install.packages('synthpop')

Monthly Downloads

2,752

Version

1.8-0

License

GPL-2 | GPL-3

Maintainer

Beata Nowok

Last Published

August 31st, 2022

Functions in synthpop (1.8-0)

Comparison of synthesised and observed data

Fitting multinomial models to synthetic data

Social Diagnosis 2011 - Objective and Subjective Quality of Life in Poland

Synthesis with bagging

syn.ctree, syn.cart

Synthesis with classification and regression trees (CART)

Synthetic data object summaries

Generating synthetic data sets

Importing original data sets form external files

replicated.uniques

Replications in synthetic data

Synthesis by logistic regression

Synthesis by unordered polytomous regression

Tools for statistical disclosure control (sdc)

Synthesis for a variable nested within another variable.

summary.fit.synds

Inference from synthetic data

syn.lognorm, syn.sqrtnorm, syn.cubertnorm

Synthesis by linear regression after transformation of a dependent variable

Passive synthesis

Synthesis of a group of categorical variables from a saturated model

Synthesis by normal linear regression preserving the marginal distribution

Synthesis of a group of categorical variables by iterative proportional fitting

Synthesis by predictive mean matching

Synthesis with a fast implementation of random forests

Synthesis by ordered polytomous regression

Synthesis by linear regression

Tables and plots of utility measures

Distributional comparison of synthesised and observed data

Synthesis of survival time by classification and regression trees (CART)

Synthesis from a saturated model based on all combinations of the predictor variables.

Synthesis by simple random sampling

synthpop-package

Generating synthetic versions of sensitive microdata for statistical disclosure control

Exporting synthetic data sets to external files

Tabular utility

Synthesis with random forest

Makes a codebook from a data frame

Compare univariate distributions of synthesised and observed data

Group numeric variables before synthesis

compare.fit.synds

Compare model estimates based on synthesised and observed data

glm.synds, lm.synds

Fitting (generalized) linear models to synthetic data

Multivariate comparison of synthesised and observed data

Fitting ordered logistic models to synthetic data