50% off: Unlimited data and AI learning.
State of Data and AI Literacy Report 2025

clusteval (version 0.1)

sim_unif: Generates random variates from five multivariate uniform populations.

Description

We generate n observations from each of four trivariate distributions such that the Euclidean distance between each of the populations is a fixed constant, delta > 0.

Usage

sim_unif(n = rep(25, 5), delta = 0, seed = NULL)

Arguments

Value

named list containing: [object Object],[object Object]

Details

To define the populations, let $x = (X_1, \ldots, X_p)'$ be a multivariate uniformly distributed random vector such that $X_j \sim U(a_j, b_j)$ is an independently distributed uniform random variable with $a_j < b_j$ for $j = 1, \ldots, p$. Let $Pi_m$ denote the $m$th population $(m = 1, \ldots, 5)$. Then, we have the five populations: Π1=U(1/2,1/2)×U(Δ1/2,Δ+1/2)×U(1/2,1/2)×U(1/2,1/2), Π2=U(Δ1/2,Δ+1/2)×U(1/2,1/2)×U(1/2,1/2)×U(1/2,1/2), Π3=U(1/2,1/2)×U(Δ1/2,Δ+1/2)×U(1/2,1/2)×U(1/2,1/2), Π4=U(1/2,1/2)×U(1/2,1/2)×U(Δ1/2,Δ+1/2)×U(1/2,1/2), Π5=U(1/2,1/2)×U(1/2,1/2)×U(1/2,1/2)×U(Δ1/2,Δ+1/2).

We generate $n_m$ observations from population $\Pi_m$.

For $\Delta = 0$ and $\rho_m = \rho$, $m = 1, \ldots, M$, the $M$ populations are equal.

Notice that the support of each population is a unit hypercube with 4 features. Moreover, for $\Delta \ge 1$, the populations are mutually exclusive and entirely separated.

Examples

Run this code
data_generated <- sim_unif(seed = 42)
dim(data_generated$x)
table(data_generated$y)

data_generated2 <- sim_unif(n = 10 * seq_len(5), delta = 1.5)
table(data_generated2$y)
sample_means <- with(data_generated2,
                     tapply(seq_along(y), y, function(i) {
                            colMeans(x[i,])
                     }))
(sample_means <- do.call(rbind, sample_means))

Run the code above in your browser using DataLab