Compare synthesised data set with the original (observed) data set
using percent frequency tables and histograms. When more than one
synthetic data set has been generated (object$m>1
), by default
pooled synthetic data are used for comparison.
# S3 method for synds
compare(object, data, vars = NULL, msel = NULL,
breaks = 20, nrow = 2, ncol = 2, rel.size.x = 1,
cols = c("#1A3C5A","#4187BF"), stat = "percents", ...)# S3 method for compare.synds
print(x, …)
an object of class synds
, which stands
for 'synthesised data set'. It is typically created by
function syn()
and it includes object$m
synthesised data set(s).
an original (observed) data set.
variables to be compared. If vars
is NULL
(the default) all synthesised variables are compared.
index or indices of synthetic data copies for which a comparison
is to be made. If NULL
pooled synthetic data copies are compared
with the original data.
the number of cells for the histogram.
the number of rows for the plotting area.
the number of columns for the plotting area.
a number representing the relative size of x-axis labels.
bar colors.
determines whether tables and plots present percentages
stat = "percents"
, the default, or counts stat = "counts"
.
If m > 1
and msel = NULL
average counts for synthetic data
are presented.
additional parameters.
an object of class compare.synds
.
An object of class compare.synds
which is a list including a list
of comparative frequency tables (tables
) and a ggplot object
(plots
) with bar charts/histograms. If multiple plots are produced
they and their corresponding frequency tables are stored as a list.
Missing data categories for numeric variables are plotted on the same plot
as non-missing values. They are indicated by miss.
suffix.
Nowok, B., Raab, G.M and Dibben, C. (2016). synthpop: Bespoke creation of synthetic data in R. Journal of Statistical Software, 74(11), 1-26. 10.18637/jss.v074.i11.
# NOT RUN {
ods <- SD2011[ , c("sex","age","edu","marital","ls","income")]
s1 <- syn(ods)
compare(s1, ods, vars = "ls")
compare(s1, ods, vars = "income", stat = "counts", breaks = 10)
# }
Run the code above in your browser using DataLab