add_reftable
creates or augments a reference table of simulations, and formats the results appropriately for further use. The user does not have to think about this return format. Instead, s-he only has to think about the very simple return format of the function given as its Simulate
argument. The primary role of his function is to wrap the call(s) of the function specified by Simulate
. Depending on the arguments, parallel or serial computation is performed.
When parallelization is implied, it is performed by by default a “socket” cluster, available on all operating systems. Special care is then needed to ensure that all required packages are loaded in the called processes, and that all required variables and functions are passed therein: check the packages
and env
arguments. For socket clusters, foreach
or pbapply
is called depending whether the doSNOW
package is attached (doSNOW
allows more efficient load balancing than pbapply
).
Alternatively, if the simulation function cannot be called directly by the R code, simulated samples can be added using the newsimuls
argument. Finally, a generic data frame of simulated samples can be reformatted as a reference table by using only the reftable
argument.
add_simulation
is a wrapper for add_reftable
, suitable when nRealizations
>1. It is now distinctly documented: the distinct features of add_simulation
were conceived for the first workflow implemented in Infusion
but are somewhat obsolete now.
add_reftable(reftable=NULL, Simulate, parsTable=par.grid, par.grid=NULL,
nRealizations = 1L, newsimuls = NULL,
verbose = interactive(), nb_cores = NULL, packages = NULL, env = NULL,
control.Simulate=NULL, cluster_args=list(), cl_seed=NULL, ...)
A data.frame (with additional attributes) is returned.
The value has the following attributes: LOWER
and UPPER
which are each a vector of per-parameter minima and maxima deduced from any newsimuls
argument, and optionally any of the arguments Simulate, control.Simulate, packages, env, parsTable
and reftable
(all corresponding to input arguments when provided, except that the actual Simulate
function is returned even if it was input as a name).
Data frame: a reference table. Each row contains parameters value of a simulated realization of the data-generating process, and the simulated summary statistics.
As parameters should be told apart from statistics by Infusion functions, information about parameter names should be attached to the reftable
*if* it is not available otherwise. Thus if no parsTable
is provided, the reftable
should have an attribute "LOWER"
(a named vectors giving lower bounds for the parameters which will vary in the analysis, as in the return value of the function).
An *R* function, or the name (as a character string) of an *R* function used to generate summary statistics for samples form a data-generating process. When an external simulation program is called, Simulate
must therefore be an R function wrapping the call to the external program. Two function APIs are handled:
*
If the function has a parsTable
argument, it must return a data frame of summary statistics, each line of which contains the vector of summary statistics for one realization of the data-generating process. The parsTable
argument of add_reftable
will be passed to Simulate
and lines of the output data frame must be ordered, as in the input parsTable
as these two data frames will be bound together.
*
Otherwise, the Simulate
function must have one argument for each element of the parameter vector (i.e. of each row of parsTable
). It must return a vector of summary statistics with named vector member.
A data frame of which each line is the vector of parameters needed by Simulate
for each simulation of the data-generating process. par.grid
is an alias for parsTable
; the latter argument may be preferred in order not to suggest that the parameter values should form a regular grid.
The number of simulated samples of summary statistics, for each parameter vector (each row of parsTable
). If not 1, theold wrkflow is assumed and add_simulation
is called.
If the function used to generate empirical distributions cannot be called by R, then newsimuls
can be used to provide these distributions. See Details for the structure of this argument.
Number of cores for parallel simulation; NULL
or integer value, acting as a shortcut for cluster_args$spec
. This is effective only if the simulation function is called separately for each row of parsTable
. Otherwise, if the simulation function is called once one the whole parsTable
, parallelisation could be controlled only through that function's own arguments.
A list of arguments, passed to makeCluster
. May contain a non-null spec
element, in which case the distinct nb_cores
argument and the global Infusion option nb_cores
are ignored. A typical usage would thus be control_args=list(spec=<number of 'children'>)
. Additional elements outfile="log.txt"
may be useful to collect output from the nodes, and type="FORK"
may be used to force a fork cluster on linux(-alikes) (otherwise a socket cluster is set up as this is the default effect of parallel::makeCluster
). Do *not* use a structured list with an add_reftable
element as is possible for refine
(see Details of refine
documentation).
Whether to print some information or not.
Additional arguments passed to Simulate
, beyond the parameter vector. These arguments should be constant through all the simulation workflow.
A list, used as an exclusive alternative to “...” to pass additional arguments to Simulate
, beyond the parameter vector. The list must contain the same elements as would otherwise go in the “...” (if control.Simulate
is left NULL, a default value is constructed from the ...).
For parallel evaluation: Names of additional libraries to be loaded on the cores, necessary for Simulate
evaluation.
For parallel evaluation: an environment containing additional objects to be exported on the cores, necessary for Simulate
evaluation.
(all parallel contexts:) Integer, or NULL. If an integer, it is used to initialize "L'Ecuyer-CMRG"
random-number generator. If cl_seed
is NULL
, the default generator is selected on each node, where its seed is not controlled. Providing the seed allows repeatable results for given parallelization settings, but may not allow identical results across different settings.
The newsimuls
argument should have the same structure as the return value of the function itself, except that newsimuls
may include only a subset of the attributes returned by the function. It is thus a data frame; its required attributes are LOWER
and UPPER
which are named vectors giving bounds for the parameters which are variable in the whole analysis (note that the names identify these parameters in the case this information is not available otherwise from the arguments). The values in these vectors may be incorrect in the sense of failing to bound the parameters in the newsimuls
, as the actual bounds are then corrected using parameter values in newsimuls
and attributes from reftable
.
## see main documentation page for the package for other typical usage
Run the code above in your browser using DataLab