sim_data: Wrapper function to generate data from a variety of data-generating models.

Description

We provide a wrapper function to generate from three data-generating models:

sim_unif: Five multivariate uniform distributions
sim_normal: Multivariate normal distributions with intraclass covariance matrices
sim_student: Multivariate Student's t distributions each with a common covariance matrix

Usage

sim_data(family = c("uniform", "normal", "student"), ...)

Arguments

family

the family of distributions from which to generate data

...

optional arguments that are passed to the data-generating function

Value

named list containing:

x:: A matrix whose rows are the observations generated and whose columns are the p features (variables)
y:: A vector denoting the population from which the observation in each row was generated.

Details

For each data-generating model, we generate $n_m$ observations $(m = 1, \ldots, M)$ from each of $M$ multivariate distributions so that the Euclidean distance between each of the population centroids and the origin is equal and scaled by $\Delta \ge 0$. For each model, the argument delta controls this separation.

This wrapper function is useful for simulation studies, where the efficacy of supervised and unsupervised learning methods and algorithms are evaluated as a the population separation is increased.

Examples

Run this code

set.seed(42)
uniform_data <- sim_data(family = "uniform")
normal_data <- sim_data(family = "normal", delta = 2)
student_data <- sim_data(family = "student", delta = 1, df = 1:5)

Run the code above in your browser using DataLab