Learn R Programming

clusteval (version 0.1)

sim_data: Wrapper function to generate data from a variety of data-generating models.

Description

We provide a wrapper function to generate from three data-generating models:
sim_unif
Five multivariate uniform distributions

sim_normal
Multivariate normal distributions with intraclass covariance matrices

sim_student
Multivariate Student's t distributions each with a common covariance matrix

Usage

sim_data(family = c("uniform", "normal", "student"), ...)

Arguments

family
the family of distributions from which to generate data
...
optional arguments that are passed to the data-generating function

Value

named list containing:
x:
A matrix whose rows are the observations generated and whose columns are the p features (variables)
y:
A vector denoting the population from which the observation in each row was generated.

Details

For each data-generating model, we generate $n_m$ observations $(m = 1, \ldots, M)$ from each of $M$ multivariate distributions so that the Euclidean distance between each of the population centroids and the origin is equal and scaled by $\Delta \ge 0$. For each model, the argument delta controls this separation.

This wrapper function is useful for simulation studies, where the efficacy of supervised and unsupervised learning methods and algorithms are evaluated as a the population separation is increased.

Examples

Run this code
set.seed(42)
uniform_data <- sim_data(family = "uniform")
normal_data <- sim_data(family = "normal", delta = 2)
student_data <- sim_data(family = "student", delta = 1, df = 1:5)

Run the code above in your browser using DataLab