Generate simulated dataset based on transformation of an underlying base distribution while checking that certain conditions are met.
simulate_data_conditional(
generator,
n_obs = 1,
reject = function(x) TRUE,
reject_max_iter = 10,
on_reject = "ignore",
return_tries = FALSE,
seed = NULL,
...
)
Data.frame or matrix with n_obs
rows for simulated dataset X
if all
conditions are met within the iteration limit. Otherwise NULL.
If return_tries
is TRUE, then the output is a list with the first entry
being the data.frame or matrix as described above, and the second entry
(n_tries
) giving a numeric with the number of tries necessary to
find the returned dataset.
Function which generates data from the underlying base distribution. It is
assumend it takes the number of simulated observations n_obs
as first
argument, as all random generation functions in the stats and
extraDistr do. Furthermore, it is expected to return a two-dimensional
array as output (matrix or data.frame). See details.
Number of simulated observations.
Function which takes a matrix or data.frame X
as single input and outputs
TRUE or FALSE. Specifies when a simulated final datamatrix X
should
be rejected. Functions must output TRUE if condition IS NOT met / FALSE if
condition IS met and matrix can be accepted. Intended to be used with
function_list
. See details.
Integer > 0. In case of rejection, how many times should a new datamatrix be
simulated until the conditions in reject
are met?
If "stop", an error is returned if after reject_max_iter
times no
suitable datamatrix X could be found. If "current", the current datamatrix
is returned, regardless of the conditions in reject
.
Otherwise, NULL is returned. In each case a warning is reported.
If TRUE, then the function also outputs the number of tries necessary to find a dataset fulfilling the condition. Useful to record to assess the possible bias of the simulated datasets. See Value.
Set random seed to ensure reproducibility of results. See Note below.
All further parameters are passed to simulate_data
.
Examples for restrictions include
variance restrictions (e.g. no constant columns which could happen due
to extreme transformations of the initial gaussian distribution Z
),
ensuring a sufficient number of observations in a given class (e.g. certain
binary variables should have at least x\
multicollinearity (e.g. X
must have full column rank). If reject
evaluates to FALSE, the current datamatrix X
is rejected.
In case of rejection, new datasets can be simulated until the conditions
are met or a given maximum iteration limit is hit (reject_max_iter
),
after which the latest datamatrix is returned or an error is reported.
The reject
function should take a single input (a data.frame or matrix)
and output TRUE if the dataset is to be rejected or FALSE if it is to be
accepted.
This package provides the function_list
convenience function
which allows to easily create a rejection function which assesses several
conditions on the input dataset by simply passing individual test functions
to function_list
. Such test function templates are found in
is_collinear
and contains_constant
.
See the example below.
For details on generating, transforming and post-processing datasets, see
simulate_data
. This function simulates data conditional
on certain requirements that must be met by the final datamatrix X
.
This checking is conducted on the output of simulate_data
(i.e.
also includes possible post-processing steps).
simdesign
,
simulate_data
,
function_list
,
is_collinear
,
contains_constant
dsgn <- simdesign_mvtnorm(diag(5))
simulate_data_conditional(dsgn, 10,
reject = function_list(is_collinear, contains_constant),
seed = 18)
Run the code above in your browser using DataLab