This function calculates summary statistics for simulation results, including descriptive statistics (e.g. measures of center or spread) and inferential statistics (e.g. bias or confidence interval coverage). All summary statistics are calculated over simulation replicates within a single simulation level.
summarize(sim, ..., mc_se = FALSE)A data frame containing the result of each specified summary function
as a column, for each of the simulation levels. The column n_reps
returns the number of successful simulation replicates within each level.
A simulation object of class sim_obj, usually created by
new_sim
One or more lists, separated by commas, specifying desired
summaries of the sim simulation object. See examples. Each list
must have a stat item, which specifies the type of summary
statistic to be calculated. The na.rm item indicates whether to
exclude NA values when performing the calculation (with default
being FALSE). For stat options where the name item
is optional, if it is not provided, a name will be formed from the type
of summary and the column on which the summary is performed. Additional
required items are detailed below for each stat type.
list(stat="mean", x="col_1", name="mean_col", na.rm=F)
computes the mean of column sim$results$col_1 for each level
combination and creates a summary column named "mean_col". Other
single-column summary statistics (see the next few items) work
analogously. name is optional.
list(stat="median", ...) computes the median.
list(stat="var", ...) computes the variance.
list(stat="sd", ...) computes the standard deviation.
list(stat="mad", ...) computes the mean absolute deviation.
list(stat="iqr", ...) computes the interquartile range.
list(stat="min", ...) computes the minimum.
list(stat="max", ...) computes the maximum.
list(stat="is_na", ...) computes the number of NA values.
list(stat="correlation", x="col_1", y="col_2",
name="cor_12") computes the (Pearson's) correlation coefficient between
sim$results$col_1 and sim$results$col_2 for each level
combination and creates a summary column named "cor_12".
list(stat="covariance", x="col_1", y="col_2",
name="cov_12") computes the covariance between sim$results$col_1
and sim$results$col_2 for each level combination and creates a
summary column named "cov_12".
list(stat="quantile", x="col_1", prob=0.8, name="q_col_1")
computes the 0.8 quantile of column sim$results$col_1 and creates
a summary column named "q_col_1". prob can be any number in
[0,1].
list(stat="bias", estimate="est", truth=5,
name="bias_est") computes the absolute bias of the estimator
corresponding to column "sim$results$est", relative to the true
value given in truth, and creates a summary column named
"bias_est". name is optional. See Details.
list(stat="bias_pct", estimate="est", truth=5,
name="bias_est") computes the percent bias of the estimator
corresponding to column "sim$results$est", relative to the true
value given in truth, and creates a summary column named
"bias_pct_est". name is optional. See Details.
list(stat="mse", estimate="est", truth=5,
name="mse_est") computes the mean squared error of the estimator
corresponding to column "sim$results$est", relative to the true
value given in truth, and creates a summary column named
"mse_est". name is optional. See Details.
list(stat="mae", estimate="est", truth=5,
name="mae_est") computes the mean absolute error of the estimator
corresponding to column "sim$results$est", relative to the true
value given in truth, and creates a summary column named
"mae_est". name is optional. See Details.
list(stat="coverage", estimate="est", se="se_est",
truth=5, name="cov_est") or
list(stat="coverage", lower="est_l", upper="est_u",
truth=5, name="cov_est") computes confidence interval coverage. With the
first form, estimate gives the name of the variable in
sim$results corresponding to the estimator of interest and
se gives the name of the variable containing the standard error of
the estimator of interest. With the second form, lower gives the
name of the variable containing the confidence interval lower bound and
upper gives the name of the confidence interval upper bound. In
both cases, truth is the true value (see Details), and a
summary column named "cov_est" is created.
A logical argument indicating whether to compute Monte Carlo
standard error and associated confidence interval for inferential summary
statistics. This applies only to the bias, bias_pct,
mse, mae, and coverage summary statistics.
For all inferential summaries there are three ways to specify
truth: (1) a single number, meaning the estimand is the same
across all simulation replicates and levels, (2) a numeric vector of the
same length as the number of rows in sim$results, or (3) the name
of a variable in sim$results containing the estimand of interest.
There are two ways to specify the confidence interval bounds for
coverage. The first is to provide an estimate and its
associated se (standard error). These should both be variables in
sim$results. The function constructs a 95% Wald-type confidence
interval of the form (estimate-1.96*se, estimate+1.96*se). The
alternative is to provide lower and upper bounds, which
should also be variables in sim$results. In this case, the
confidence interval is (lower, upper). The coverage is the
proportion of simulation replicates for a given level combination in
which truth lies within the interval.
sim <- new_sim()
create_data <- function(n) { rpois(n, lambda=5) }
est_mean <- function(dat, type) {
if (type=="M") { return(mean(dat)) }
if (type=="V") { return(var(dat)) }
}
sim %<>% set_levels(n=c(10,100,1000), est=c("M","V"))
sim %<>% set_config(num_sim=5)
sim %<>% set_script(function() {
dat <- create_data(L$n)
lambda_hat <- est_mean(dat=dat, type=L$est)
return (list("lambda_hat"=lambda_hat))
})
sim %<>% run()
sim %>% summarize(
list(stat = "mean", name="mean_lambda_hat", x="lambda_hat"),
list(stat = "mse", name="lambda_mse", estimate="lambda_hat", truth=5)
)
Run the code above in your browser using DataLab