Generate simulated dataset based on transformation of an underlying base distribution.
simulate_data(generator, ...)# S3 method for default
simulate_data(
generator = function(n) matrix(rnorm(n)),
n_obs = 1,
transform_initial = base::identity,
names_final = NULL,
prefix_final = NULL,
process_final = list(),
seed = NULL,
...
)
# S3 method for simdesign
simulate_data(
generator,
n_obs = 1,
seed = NULL,
apply_transformation = TRUE,
apply_processing = TRUE,
...
)
Data.frame or matrix with n_obs
rows for simulated dataset X
.
Function which generates data from the underlying base distribution. It is
assumed it takes the number of simulated observations n_obs
as first
argument, as all random generation functions in the stats and
extraDistr do. Furthermore, it is expected to return a two-dimensional
array as output (matrix or data.frame). Alternatively an R object derived
from the simdata::simdesign
class. See details.
Further arguments passed to generator
function.
Number of simulated observations.
Function which specifies the transformation of the underlying
dataset Z
to final dataset X
. See details.
NULL or character vector with variable names for final dataset X
.
Length needs to equal the number of columns of X
.
Overrides other naming options. See details.
NULL or prefix attached to variables in final dataset X
. Overriden
by names_final
argument. Set to NULL if no prefixes should
be added. See details.
List of lists specifying post-processing functions applied to final
datamatrix X
before returning it. See do_processing
.
Set random seed to ensure reproducibility of results.
This argument can be set to FALSE to override the information stored in the
passed simdesign
object and not transform and process data.
Thus, the raw data from the design generator is returned. This can be useful
for debugging purposes.
This argument can be set to FALSE to override the information stored in the
passed simdesign
object and not transform and process data after
the initial data is transformed. This can be useful for debugging purposes.
simulate_data(default)
: Function to be used if no simdesign
S3 class is used.
simulate_data(simdesign)
: Function to be used with simdesign
S3 class.
The generator
function which is either passed directly, or via a
simdata::simdesign
object, is assumed to provide the same interface
as the random generation functions in the R stats and extraDistr
packages. Specifically, that means it takes the number of observations as
first argument. All further arguments can be set via passing them as
named argument to this function. It is expected to return a two-dimensional
array (matrix or data.frame) for which the number of columns can be
determined. Otherwise the check_and_infer
step will fail.
Transformations should be applicable to the output of the generator
function (i.e. take a data.frame or matrix as input) and output another
data.frame or matrix. A convenience function function_list
is
provided by this package to specify transformations as a list of functions,
which take the whole datamatrix Z
as single argument and can be used to
apply specific transformations to the columns of that matrix. See the
documentation for function_list
for details.
Post-processing the datamatrix is based on do_processing
.
Variables are named by names_final
if not NULL and of correct length.
Otherwise, if prefix_final
is not NULL, it is used as prefix for variable
numbers. Otherwise, variables names remain as returned by the generator
function.
Data is generated using the following procedure:
An underlying dataset Z
is sampled from some distribution. This is
done by a call to the generator
function.
Z
is then transformed into the final dataset X
by applying the
transform
function to Z
.
X
is post-processed if specified (e.g. truncation to avoid
outliers).
simdesign
,
simdesign_mvtnorm
,
simulate_data_conditional
,
do_processing
generator <- function(n) mvtnorm::rmvnorm(n, mean = 0)
simulate_data(generator, 10, seed = 24)
Run the code above in your browser using DataLab