utility.synds(object, data, method = "cart", cp = 0.0001, maxorder = 1, deviance = FALSE, null.utility = FALSE, syn.only = FALSE, all.comb = FALSE, ...)
"print"(x, ...)
synds
, which stands for 'synthesised
data set'. It is typically created by function syn()
and it includes
object$m
synthesised data set(s)."cart"
, "logit"
and "poly"
.
See details for more."cart"
."logit"
method. For model without interactions 0
should be
provided.m
imputation is combined with the original
data. Only one utility value is returned rather than one for each synthetic
data set.utility.synds.
utility.synds
which is a list include the mean and
sd of the propensity score utility values (if m > 1
) and also the raw
utility values for each synthetic set.The list also contains the chosen method for modeling the propensity scores.
If multiple methods were provided, the list will contain a matrix of utility
value results for each of the different methods.If null.utility
is set to TRUE
, two more elements are given in
the output list. nullstats.summary
provides both the differences and
ratios between observed and null mean utility statistics (MSEs) for each
method. nullstats.raw
gives utility statistics from each pair of
synthetic data sets compared.If "deviance"
is set to TRUE
, the function also returns the
deviance statistic. Calculated as -2[log likelihood (estimated model)] / N,
where the estimated model uses the predicted propensity scores and the null
model uses the true proportion of records in the synthetic data (i.e. no
distinguishability between original and synthetic data).
Propensity scores can be modeled in a variey of ways. Commonly a simple
logistic regression is used with all variables in the data set as predictors,
implemented here as method "logit"
. Alternative modeling options
available here are classification and regression trees as method "cart"
(default) and multivariate adaptive regression splines using the
"polspline"
packing under function "polyclass"
as method "poly"
(in testing).
If missing values exist, indicator varibales are added and included in the
model as recommended by Rosenbaum and Rubin (1984). The missing cell is then
either imputed as zero or the mean. For cateogrical variables, NA
is
treated as a new category. Thanks to https://github.com/markmfredrickson/optmatch/blob/master/R/fill.NAs.R
for useful code chunks on flagging and filling.
Null propensity score MSEs can be estimated and returned for each selected method. These values are estimated as proposed in Snoke et. al. (forthcoming) and give an estimate of the expected values under a correct synthesis model. These are used to produce a difference or ratio between the observed and null MSEs to improve accuracy of the utility estimate.
ods <- SD2011[1:1000, c("age", "bmi", "depress", "alcabuse", "englang")]
s1 <- syn(ods, m = 5)
utility.synds(s1, ods)
Run the code above in your browser using DataLab