
Last chance! 50% off unlimited learning
Sale ends in
This function runs a Monte Carlo simulation study given a set of predefined simulation functions,
design conditions, and number of replications. Results can be saved as temporary files in case of interruptions
and may be restored by re-running runSimulation
, provided that the respective temp
file can be found in the working directory. runSimulation
supports parallel
and cluster computing, global and local debugging, error handling (including fail-safe
stopping when functions fail too often, even across nodes), provides bootstrap estimates of the
sampling variability (optional), and tracking of error and warning messages.
For convenience, all functions available in the R workspace are exported across all computational nodes
so that they are more easily accessible (however, other R objects are not, and therefore
must be passed to the fixed_objects
input to become available across nodes).
For a didactic presentation of the package refer to Sigal and Chalmers
(2016; 10.1080/10691898.2016.1246953), and see the associated
wiki on Github (https://github.com/philchalmers/SimDesign/wiki)
for other tutorial material, examples, and applications of SimDesign
to real-world simulations.
runSimulation(
design,
replications,
generate,
analyse,
summarise,
fixed_objects = NULL,
packages = NULL,
bootSE = FALSE,
boot_draws = 1000L,
filename = "SimDesign-results",
seed = rint(nrow(design), min = 1L, max = 2147483647L),
save = FALSE,
save_results = FALSE,
store_results = FALSE,
warnings_as_errors = FALSE,
save_seeds = FALSE,
load_seed = NULL,
parallel = FALSE,
ncores = parallel::detectCores(),
cl = NULL,
MPI = FALSE,
max_errors = 50L,
save_details = list(),
debug = "none",
progress = TRUE,
allow_na = FALSE,
allow_nan = FALSE,
stop_on_fatal = FALSE,
edit = "none",
verbose = TRUE
)# S3 method for SimDesign
summary(object, ...)
# S3 method for SimDesign
print(x, list2char = TRUE, ...)
a tibble
or data.frame
object containing the Monte Carlo simulation
conditions to be studied, where each row represents a unique condition and each column a factor
to be varied. Note that if a tibble
is supplied this object will be coerced to a
data.frame
internally. See createDesign
for the standard approach
to create this simulation design object
number of replication to perform per condition (i.e., each row in design
).
Must be greater than 0
user-defined data and parameter generating function.
See Generate
for details. Note that this argument may be omitted by the
user if they wish to generate the data with the analyse
step, but for real-world
simulations this is generally not recommended
optional (but highly recommended) user-defined summary function from Summarise
to be used after all the replications have completed within each design
condition. Omitting this function
will return a list of data.frame
s (or a single data.frame
, if only one row in
design
is supplied) or, for more general objects (such as list
s), a list
containing the results returned form Analyse
.
Alternatively, the value NA
can be passed to let the generate-analyse-summarise process to run as usual,
where the summarise components are instead included only as a placeholder.
Ommiting this input is only recommended for didactic purposes because it leaves out a large amount of
information (e.g., try-errors, warning messages, saving files, etc), can witness memory related issues,
and generally is not as flexible internally. If users do not wish to supply a summarise function then it
is is recommended to pass the values NA
to indicate this function is deliberately omitted, but that
the save_results
option should be used to save the results during the simulation. This provides a
more RAM friendly alternative to storing all the Generate-Analyse results in the working environment, where
the Analysis results can be summarised at a later time
(optional) an object (usually a named list
)
containing additional user-defined objects
that should remain fixed across conditions. This is useful when including
long fixed vectors/matrices of population parameters, data
that should be used across all conditions and replications (e.g., including a fixed design matrix
for linear regression), or simply control constant global elements such as sample size
a character vector of external packages to be used during the simulation (e.g.,
c('MASS', 'extraDistr', 'simsem')
). Use this input when parallel = TRUE
or
MPI = TRUE
to use non-standard functions from additional packages,
otherwise the functions must be made available by using explicit
library
or require
calls within the provided simulation functions.
Alternatively, functions can be called explicitly without attaching the package with the ::
operator
(e.g., extraDistr::rgumbel()
)
logical; perform a non-parametric bootstrap to compute bootstrap standard error
estimates for the respective meta-statistics computed by the Summarise
function?
When TRUE
, bootstrap samples for each row in Design
will be obtained after
the generate-analyse steps have obtain the simulation results to be summarised so that
standard errors for each statistic can be computed. To compute large-sample confidence
intervals given the bootstrap SE estimates see SimBoot
.
This option is useful to approximate how accurate the resulting meta-statistic estimates
were, particularly if the number of replications
was relatively low (e.g., less than
5000). If users prefer to obtain alternative bootstrap estimates then consider passing
save_results = TRUE
, reading the generate-analyse data into R via
SimResults
, and performing the bootstrap manually with function found in the
external boot
package
number of non-parametric bootstrap draws to sample for the summarise
function after the generate-analyse replications are collected. Default is 1000
(optional) the name of the .rds
file to save the final simulation results to
when save = TRUE
. If the same file name already exists in the working
directly at the time of saving then a new
file will be generated instead and a warning will be thrown. This helps to avoid accidentally overwriting
existing files. Default is 'SimDesign-results'
a vector of integers to be used for reproducibility.
The length of the vector must be equal the number of rows in design
.
This argument calls set.seed
or
clusterSetRNGStream
for each condition, respectively,
but will not be run when MPI = TRUE
.
Default randomly generates seeds within the range 1 to 2147483647 for each condition.
logical; save the simulation state and final results to the hard-drive? This is useful
for simulations which require an extended amount of time. When TRUE
, a temp file
will be created in the working directory which allows the simulation state to be saved
and recovered (in case of power outages, crashes, etc). As well, triggering this flag will
save any fatal .Random.seed
states when conditions unexpectedly crash (where each seed
is stored row-wise in an external .rds file), which provides a much easier mechanism
to debug the issue (see load_seed
for details).
Additionally, to recover your simulation at the last known
location simply re-run the code you used to initially define the simulation and the external file
will automatically be detected and read-in. Upon completion, the final results will
be saved to the working directory, and the temp file will be removed. Default is FALSE
logical; save the results returned from Analyse
to external
.rds
files located in the defined save_results_dirname
directory/folder?
Use this if you would like to keep track of the individual parameters returned from the analyses.
Each saved object will contain a list of three elements containing the condition (row from design
),
results (as a list
or matrix
), and try-errors. When TRUE
, a temp file will be used to track the simulation
state (in case of power outages, crashes, etc). When TRUE
, temporary files will also be saved
to the working directory (in the same way as when save = TRUE
).
See SimResults
for an example of how to read these .rds
files back into R
after the simulation is complete. Default is FALSE
.
WARNING: saving results to your hard-drive can fill up space very quickly for larger simulations. Be sure to
test this option using a smaller number of replications before the full Monte Carlo simulation is performed.
See also reSummarise
for applying summarise functions from saved
simulation results
logical; store the complete tables of simulation results
in the returned object? This is FALSE
by default to help avoid RAM
issues (see save_results
as a more suitable alternative). To extract these results
pass the returned object to SimExtract(..., what = 'results')
, which will return a named list
of all the simulation results for each condition
logical; treat warning messages as errors during the simulation? Default is FALSE, therefore warnings are only collected and not used to restart the data generation step
logical; save the .Random.seed
states prior to performing each replication into
plain text files located in the defined save_seeds_dirname
directory/folder?
Use this if you would like to keep track of the simulation state within each replication and design
condition. Primarily, this is useful for completely replicating any cell in the simulation if need be,
especially when tracking down hard-to-find errors and bugs. As well, see the load_seed
input
to load a given .Random.seed
to exactly replicate the generated data and analysis state (mostly useful
for debugging). When TRUE
, temporary files will also be saved
to the working directory (in the same way as when save = TRUE
). Default is FALSE
Note, however, that this option is not typically necessary since the .Random.seed
states for simulation
replications that threw errors during the execution are automatically stored within the final simulation
object, and can be extracted and investigated using SimExtract
. Hence, this option is only of
interest when all of the replications must be reproducible, otherwise the defaults to runSimulation
are likely sufficient for most simulation studies.
a character object indicating which file to load from when the .Random.seed
s have
be saved (after a call with save_seeds = TRUE
), or an integer vector indicating the actual
.Random.seed
values. E.g., load_seed = 'design-row-2/seed-1'
will load the first seed in the second row of the design
input, or explicitly passing the 626 long
elements from .Random.seed
(see SimExtract
to extract the seeds associated explicitly
with errors during the simulation, where each column represents a unique seed).
If the input is a character vector then it is important NOT
to modify the design
input object, otherwise the path may not point to the correct saved location, while
if the input is an integer vector then it WILL be important to modify the design
input in order to load this
exact seed for the corresponding design row. Default is NULL
logical; use parallel processing from the parallel
package over each
unique condition?
number of cores to be used in parallel execution. Default uses all available
cluster object defined by makeCluster
used to run code in parallel.
If NULL
and parallel = TRUE
, a local cluster object will be defined which
selects the maximum number cores available
and will be stop the cluster when the simulation is complete. Note that supplying a cl
object will automatically set the parallel
argument to TRUE
logical; use the foreach
package in a form usable by MPI to run simulation
in parallel on a cluster? Default is FALSE
the simulation will terminate when more than this number of consecutive errors are thrown in any given condition. The purpose of this is to indicate that something fatally problematic is likely going wrong in the generate-analyse phases and should be inspected. Default is 50
a list pertaining to information regarding how and where files should be saved
when the save
or save_results
flags are triggered.
safe
logical; trigger whether safe-saving should be performed. When TRUE
files
will never be overwritten accidentally, and where appropriate the program will either stop or generate
new files with unique names. Default is TRUE
compname
name of the computer running the simulation. Normally this doesn't need
to be modified, but in the event that a manual node breaks down while running a simulation the
results from the temp files may be resumed on another computer by changing the name of the
node to match the broken computer. Default is the result of evaluating unname(Sys.info()['nodename'])
out_rootdir
root directory to save all files to. Default uses the current working directory
tmpfilename
the name of the temporary .rds
file when any of the save
flags are used.
This file will be read-in if it is in the working directory and the simulation will continue
at the last point this file was saved
(useful in case of power outages or broken nodes). Finally, this file will be deleted when the
simulation is complete. Default is the system name (compname
) appended
to 'SIMDESIGN-TEMPFILE_'
save_results_dirname
a string indicating the name of the folder to save
result objects to when save_results = TRUE
. If a directory/folder does not exist
in the current working directory then a unique one will be created automatically. Default is
'SimDesign-results_'
with the associated compname
appended
save_seeds_dirname
a string indicating the name of the folder to save
.Random.seed
objects to when save_seeds = TRUE
. If a directory/folder does not exist
in the current working directory then one will be created automatically. Default is
'SimDesign-seeds_'
with the associated compname
appended
a string indicating where to initiate a browser()
call for editing and debugging.
General options are 'none'
(default; no debugging), 'error'
, which starts the debugger
when any error in the code is detected in one of three generate-analyse-summarise functions,
and 'all'
, which debugs all the user defined functions regardless of whether an error was thrown
or not. Specific options include: 'generate'
to debug the data simulation function, 'analyse'
to debug the computational function, and
'summarise'
to debug the aggregation function.
Alternatively, users may place browser
calls within the respective functions for
debugging at specific lines, which is useful when debugging based on conditional evaluations (e.g.,
if(this == 'problem') browser()
). Note that parallel computation flags
will automatically be disabled when a browser()
is detected
logical; display a progress bar for each simulation condition?
This is useful when simulations conditions take a long time to run.
Uses the pbapply
package to display the progress. Default is FALSE
logical; should NA
s be allowed in the analyse step as a valid result from the simulation
analysis? Default is FALSE
logical; should NaN
s be allowed in the analyse step as a valid result from the simulation
analysis? Default is FALSE
logical; should the simulation be terminated immediately when
the maximum number of consecutive errors (max_errors
) is reached? If FALSE
,
the simulation will continue as though errors did not occur, however a column
FATAL_TERMINATION
will be included in the resulting object indicating the final
error message observed, and NA
placeholders will be placed in all other row-elements. Default is
FALSE
this argument has been deprecated. Please use debug
instead
logical; print messages to the R console? Default is TRUE
SimDesign object returned from runSimulation
additional arguments
SimDesign object returned from runSimulation
logical; for tibble
object re-evaluate list elements
as character vectors for better printing of the levels? Note that this
does not change the original classes of the object, just how they are printed.
Default is TRUE
a tibble
from the dplyr
package (also of class 'SimDesign'
)
with the original design
conditions in the left-most columns,
simulation results and ERROR/WARNING's (if applicable) in the middle columns,
and additional information (such as REPLICATIONS, SIM_TIME, COMPLETED, and SEED) in the right-most
columns.
To conserve RAM, temporary objects (such as data generated across conditions and replications)
are discarded; however, these can be saved to the hard-disk by passing the appropriate flags.
For longer simulations it is recommended to use save = TRUE
to temporarily save the
simulation state, and to use the save_results
flag to write the analysis results
the to hard-disc.
The use of the save_seeds
option can be evoked to save R's .Random.seed
state to allow
for complete reproducibility of each replication within each condition. These
individual .Random.seed
terms can then be read in with the
load_seed
input to reproduce the exact simulation state at any given replication. Most often though,
save_seeds
is less useful since problematic seeds are automatically stored in the final
simulation object to allow for easier replicability of potentially problematic errors. Finally,
providing a vector of seeds
is also possible to ensure
that each simulation condition is completely reproducible under the single/multi-core method selected.
The load_seed
input will also accept an integer vector corresponding to the exact
.Random.seed
state. This is helpful because SimDesign also tracks these seeds for simulation
conditions that threw errors, where these values can be extracted via SimExtract(..., what='error_seeds')
function. The column names indicate the respective design row (first number), the order in which
the errors were thrown (second number), and finally the error message string (coerced to a proper
data.frame column name). After this data.frame object is extracted, individual columns can be passed to load_seed
to replicate the exact error issue that appeared (note that the design
object must be indexed
manually to ensure that the correct design conditions is paired with this exact .Random.seed
state).
Finally, when the Monte Carlo simulation is complete
it is recommended to write the results to a hard-drive for safe keeping, particularly with the
save
and filename
arguments provided (for reasons that are more obvious in the parallel computation
descriptions below). Using the filename
argument (along with save = TRUE
)
supplied is much safer than using something
like saveRDS
directly because files will never accidentally be overwritten,
and instead a new file name will be created when a conflict arises; this type of safety
is prevalent in many aspects of the package and helps to avoid many unrecoverable (yet surprisingly common)
mistakes.
In the event of a computer crash, power outage, etc, if save = TRUE
was used
then the original code used to execute runSimulation()
need only be re-run to resume the simulation.
The saved temp file will be read into the function automatically, and the simulation will continue
one the condition where it left off before the simulation state was terminated.
When running simulations in parallel (either with parallel = TRUE
or MPI = TRUE
)
R objects defined in the global environment will generally not be visible across nodes.
Hence, you may see errors such as Error: object 'something' not found
if you try to use an object
that is defined in the workspace but is not passed to runSimulation
.
To avoid this type or error, simply pass additional objects to the
fixed_objects
input (usually it's convenient to supply a named list of these objects).
Fortunately, however, custom functions defined in the global environment are exported across
nodes automatically. This makes it convenient when writing code because custom functions will
always be available across nodes if they are visible in the R workspace. As well, note the
packages
input to declare packages which must be loaded via library()
in order to make
specific non-standard R functions available across nodes.
The strategy for organizing the Monte Carlo simulation work-flow is to
Define a suitable Design
object (a tibble
or data.frame
)
containing fixed conditional
information about the Monte Carlo simulations. This is often expedited by using the
createDesign
function, and if necessary the argument subset
can be used to remove redundant or non-applicable rows
Define the three step functions to generate the data (Generate
; see also
https://CRAN.R-project.org/view=Distributions for a list of distributions in R),
analyse the generated data by computing the respective parameter estimates, detection rates,
etc (Analyse
), and finally summarise the results across the total
number of replications (Summarise
).
Pass the above objects to the runSimulation
function, and declare the
number of replications to perform with the replications
input. This function will accept
a design
data.frame object and will return a suitable data.frame object with the
simulation results
Analyze the output from runSimulation
, possibly using ANOVA techniques
(SimAnova
) and generating suitable plots and tables
Expressing the above more succinctly, the functions to be called have the following form, with the exact inputs listed
Design <- createDesign(...)
Generate <- function(condition, fixed_objects = NULL) {...}
Analyse <- function(condition, dat, fixed_objects = NULL) {...}
Summarise <- function(condition, results, fixed_objects = NULL) {...}
res <- runSimulation(design=Design, replications, generate=Generate,
analyse=Analyse, summarise=Summarise)
The condition
object above represents a single row from the design
object, indicating
a unique Monte Carlo simulation condition. The condition
object also contains two
additional elements to help track the simulation's state: an ID
variable, indicating
the respective row number in the design
object, and a REPLICATION
element
indicating the replication iteration number. Mainly, these are included to help with debugging,
where users can easily locate the r
th replication (e.g., REPLICATION == 500
)
within the j
th row in the simulation design (e.g., ID == 2
). The
REPLICATION
input is also useful when temporarily saving files to the hard-drive
when calling external command line utilities (see examples on the wiki).
For a skeleton version of the work-flow, which is often useful when initially defining a simulation,
use SimFunctions
. This function will write template simulation code
to one/two files so that modifying the required functions and objects can begin immediately
with minimal error. This means that you can focus on your Monte Carlo simulation immediately rather
than worrying about the administrative code-work required to organize the simulation work-flow.
Additional information for each condition are also contained in the object returned by
runSimulation
: REPLICATIONS
to indicate the number of Monte Carlo replications,
SIM_TIME
to indicate how long (in seconds) it took to complete
all the Monte Carlo replications for each respective design condition,
COMPLETED
to indicate the date in which the given simulation condition completed,
SEED
for the integer values in the seed
argument, and, if applicable,
ERRORS
and WARNINGS
which contain counts for the number of error or warning
messages that were caught (if no errors/warnings were observed these columns will be omitted).
Note that to extract the specific error and warnings messages see
SimExtract
. Finally,
if bootSE = TRUE
was included then the final right-most columns will contain the labels
BOOT_SE.
followed by the name of the associated meta-statistic defined in summarise()
.
Additional examples, presentation files, and tutorials can be found on the package wiki located at https://github.com/philchalmers/SimDesign/wiki.
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
10.1080/10691898.2016.1246953
SimFunctions
, createDesign
,
Generate
, Analyse
, Summarise
,
SimExtract
,
reSummarise
, SimClean
, SimAnova
, SimResults
,
SimBoot
, aggregate_simulations
, Attach
,
SimShiny
# NOT RUN {
#-------------------------------------------------------------------------------
# Example 1: Sampling distribution of mean
# This example demonstrate some of the simpler uses of SimDesign,
# particularly for classroom settings. The only factor varied in this simulation
# is sample size.
# skeleton functions to be saved and edited
SimFunctions()
#### Step 1 --- Define your conditions under study and create design data.frame
Design <- data.frame(N = c(10, 20, 30))
#~~~~~~~~~~~~~~~~~~~~~~~~
#### Step 2 --- Define generate, analyse, and summarise functions
# help(Generate)
Generate <- function(condition, fixed_objects = NULL){
dat <- with(condition, rnorm(N, 10, 5)) # distributed N(10, 5)
dat
}
# help(Analyse)
Analyse <- function(condition, dat, fixed_objects = NULL){
ret <- mean(dat) # mean of the sample data vector
ret
}
# help(Summarise)
Summarise <- function(condition, results, fixed_objects = NULL){
ret <- c(mu=mean(results), SE=sd(results)) # mean and SD summary of the sample means
ret
}
#~~~~~~~~~~~~~~~~~~~~~~~~
#### Step 3 --- Collect results by looping over the rows in design
# run the simulation
Final <- runSimulation(design=Design, replications=1000,
generate=Generate, analyse=Analyse, summarise=Summarise)
Final
# reproduce exact simulation
Final_rep <- runSimulation(design=Design, replications=1000, seed=Final$SEED,
generate=Generate, analyse=Analyse, summarise=Summarise)
Final_rep
#~~~~~~~~~~~~~~~~~~~~~~~~
#### Extras
# compare SEs estimates to the true SEs from the formula sigma/sqrt(N)
5 / sqrt(Design$N)
# To store the results from the analyse function either
# a) omit a definition of of summarise(), or
# b) pass save_results = TRUE to runSimulation() and read the results in with SimResults()
# e.g., the a) approach
res <- runSimulation(design=Design, replications=1000,
generate=Generate, analyse=Analyse)
str(res)
head(res[[1]])
# or b) approach
Final <- runSimulation(design=Design, replications=1000, save_results=TRUE,
generate=Generate, analyse=Analyse, summarise=Summarise)
res <- SimResults(Final)
str(res)
head(res[[1]]$results)
# remove the saved results from the hard-drive if you no longer want them
SimClean(results = TRUE)
#-------------------------------------------------------------------------------
# Example 2: t-test and Welch test when varying sample size, group sizes, and SDs
# skeleton functions to be saved and edited
SimFunctions()
# }
# NOT RUN {
# in real-world simulations it's often better/easier to save
# these functions directly to your hard-drive with
SimFunctions('my-simulation')
# }
# NOT RUN {
#### Step 1 --- Define your conditions under study and create design data.frame
Design <- createDesign(sample_size = c(30, 60, 90, 120),
group_size_ratio = c(1, 4, 8),
standard_deviation_ratio = c(.5, 1, 2))
Design
#~~~~~~~~~~~~~~~~~~~~~~~~
#### Step 2 --- Define generate, analyse, and summarise functions
Generate <- function(condition, fixed_objects = NULL){
N <- condition$sample_size # alternatively, could use Attach() to make objects available
grs <- condition$group_size_ratio
sd <- condition$standard_deviation_ratio
if(grs < 1){
N2 <- N / (1/grs + 1)
N1 <- N - N2
} else {
N1 <- N / (grs + 1)
N2 <- N - N1
}
group1 <- rnorm(N1)
group2 <- rnorm(N2, sd=sd)
dat <- data.frame(group = c(rep('g1', N1), rep('g2', N2)), DV = c(group1, group2))
dat
}
Analyse <- function(condition, dat, fixed_objects = NULL){
welch <- t.test(DV ~ group, dat)
ind <- t.test(DV ~ group, dat, var.equal=TRUE)
# In this function the p values for the t-tests are returned,
# and make sure to name each element, for future reference
ret <- c(welch = welch$p.value, independent = ind$p.value)
ret
}
Summarise <- function(condition, results, fixed_objects = NULL){
#find results of interest here (e.g., alpha < .1, .05, .01)
ret <- EDR(results, alpha = .05)
ret
}
#~~~~~~~~~~~~~~~~~~~~~~~~
#### Step 3 --- Collect results by looping over the rows in design
# first, test to see if it works
res <- runSimulation(design=Design, replications=5, store_results=TRUE,
generate=Generate, analyse=Analyse, summarise=Summarise)
res
# }
# NOT RUN {
# complete run with 1000 replications per condition
res <- runSimulation(design=Design, replications=1000, parallel=TRUE,
generate=Generate, analyse=Analyse, summarise=Summarise)
res
View(res)
## save final results to a file upon completion (not run)
runSimulation(design=Design, replications=1000, parallel=TRUE, save=TRUE, filename = 'mysim',
generate=Generate, analyse=Analyse, summarise=Summarise)
## Debug the generate function. See ?browser for help on debugging
## Type help to see available commands (e.g., n, c, where, ...),
## ls() to see what has been defined, and type Q to quit the debugger
runSimulation(design=Design, replications=1000,
generate=Generate, analyse=Analyse, summarise=Summarise,
parallel=TRUE, debug='generate')
## Alternatively, place a browser() within the desired function line to
## jump to a specific location
Summarise <- function(condition, results, fixed_objects = NULL){
#find results of interest here (e.g., alpha < .1, .05, .01)
browser()
ret <- EDR(results[,nms], alpha = .05)
ret
}
runSimulation(design=Design, replications=1000,
generate=Generate, analyse=Analyse, summarise=Summarise,
parallel=TRUE)
## EXTRA: To run the simulation on a MPI cluster, use the following setup on each node (not run)
# library(doMPI)
# cl <- startMPIcluster()
# registerDoMPI(cl)
# Final <- runSimulation(design=Design, replications=1000, MPI=TRUE, save=TRUE,
# generate=Generate, analyse=Analyse, summarise=Summarise)
# saveRDS(Final, 'mysim.rds')
# closeCluster(cl)
# mpi.quit()
## Similarly, run simulation on a network linked via ssh
## (two way ssh key-paired connection must be possible between master and slave nodes)
##
## define IP addresses, including primary IP
# primary <- '192.168.2.20'
# IPs <- list(
# list(host=primary, user='phil', ncore=8),
# list(host='192.168.2.17', user='phil', ncore=8)
# )
# spec <- lapply(IPs, function(IP)
# rep(list(list(host=IP$host, user=IP$user)), IP$ncore))
# spec <- unlist(spec, recursive=FALSE)
#
# cl <- parallel::makeCluster(type='PSOCK', master=primary, spec=spec)
# res <- runSimulation(design=Design, replications=1000, parallel = TRUE, save=TRUE,
# generate=Generate, analyse=Analyse, summarise=Summarise, cl=cl)
#~~~~~~~~~~~~~~~~~~~~~~~~
###### Post-analysis: Analyze the results via functions like lm() or SimAnova(), and create
###### tables(dplyr) or plots (ggplot2) to help visualize the results.
###### This is where you get to be a data analyst!
library(dplyr)
res %>% summarise(mean(welch), mean(independent))
res %>% group_by(standard_deviation_ratio, group_size_ratio) %>%
summarise(mean(welch), mean(independent))
# quick ANOVA analysis method with all two-way interactions
SimAnova( ~ (sample_size + group_size_ratio + standard_deviation_ratio)^2, res,
rates = TRUE)
# or more specific ANOVAs
SimAnova(independent ~ (group_size_ratio + standard_deviation_ratio)^2,
res, rates = TRUE)
# make some plots
library(ggplot2)
library(tidyr)
dd <- res %>%
select(group_size_ratio, standard_deviation_ratio, welch, independent) %>%
pivot_longer(cols=c('welch', 'independent'), names_to = 'stats')
dd
ggplot(dd, aes(factor(group_size_ratio), value)) + geom_boxplot() +
geom_abline(intercept=0.05, slope=0, col = 'red') +
geom_abline(intercept=0.075, slope=0, col = 'red', linetype='dotted') +
geom_abline(intercept=0.025, slope=0, col = 'red', linetype='dotted') +
facet_wrap(~stats)
ggplot(dd, aes(factor(group_size_ratio), value, fill = factor(standard_deviation_ratio))) +
geom_boxplot() + geom_abline(intercept=0.05, slope=0, col = 'red') +
geom_abline(intercept=0.075, slope=0, col = 'red', linetype='dotted') +
geom_abline(intercept=0.025, slope=0, col = 'red', linetype='dotted') +
facet_grid(stats~standard_deviation_ratio) +
theme(legend.position = 'none')
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab