synthpop (version 1.5-1)

compare.synds: Compare univariate distributions of synthesised and observed data

Description

Compare synthesised data set with the original (observed) data set using percent frequency tables and histograms. When more than one synthetic data set has been generated (object$m>1), by default pooled synthetic data are used for comparison.

Usage

# S3 method for synds
compare(object, data, vars = NULL, msel = NULL, 
  breaks = 20, nrow = 2, ncol = 2, rel.size.x = 1, 
  cols = c("#1A3C5A","#4187BF"), stat = "percents", ...)

# S3 method for compare.synds print(x, …)

Arguments

object

an object of class synds, which stands for 'synthesised data set'. It is typically created by function syn() and it includes object$m synthesised data set(s).

data

an original (observed) data set.

vars

variables to be compared. If vars is NULL (the default) all synthesised variables are compared.

msel

index or indices of synthetic data copies for which a comparison is to be made. If NULL pooled synthetic data copies are compared with the original data.

breaks

the number of cells for the histogram.

nrow

the number of rows for the plotting area.

ncol

the number of columns for the plotting area.

rel.size.x

a number representing the relative size of x-axis labels.

cols

bar colors.

stat

determines whether tables and plots present percentages stat = "percents", the default, or counts stat = "counts". If m > 1 and msel = NULL average counts for synthetic data are presented.

additional parameters.

x

an object of class compare.synds.

Value

An object of class compare.synds which is a list including a list of comparative frequency tables (tables) and a ggplot object (plots) with bar charts/histograms. If multiple plots are produced they and their corresponding frequency tables are stored as a list.

Details

Missing data categories for numeric variables are plotted on the same plot as non-missing values. They are indicated by miss. suffix.

References

Nowok, B., Raab, G.M and Dibben, C. (2016). synthpop: Bespoke creation of synthetic data in R. Journal of Statistical Software, 74(11), 1-26. 10.18637/jss.v074.i11.

Examples

Run this code
# NOT RUN {
ods <- SD2011[ , c("sex","age","edu","marital","ls","income")]
s1  <- syn(ods)
compare(s1, ods, vars = "ls")
compare(s1, ods, vars = "income", stat = "counts", breaks = 10)
# }

Run the code above in your browser using DataLab