the family of distributions from which to
generate data
...
optional arguments that are passed to the
data-generating function
Value
named list containing:
x:
A matrix
whose rows are the observations generated and whose
columns are the p features (variables)
y:
A vector denoting the population from which the
observation in each row was generated.
Details
For each data-generating model, we generate $n_m$
observations $(m = 1, \ldots, M)$ from each of
$M$ multivariate distributions so that the Euclidean
distance between each of the population centroids and the
origin is equal and scaled by $\Delta \ge 0$. For
each model, the argument delta controls this
separation.
This wrapper function is useful for simulation studies,
where the efficacy of supervised and unsupervised
learning methods and algorithms are evaluated as a the
population separation is increased.