simdata_uniform: Generates random variates from multivariate uniform populations.

Description

We generate $n$ observations from each of $K_0$ multivariate uniform distributions such that the Euclidean distance between each of the populations and the origin is equal and scaled by $\Delta \ge 0$.

Usage

simdata_uniform(n = rep(25, 5), delta = 0, seed = NULL)

Arguments

a vector (of length $K_0$) of the sample sizes for each population

delta

the fixed distance between each population and the origin

seed

seed for random number generation. (If NULL, does not set seed)

Value

named list containing: [object Object],[object Object]

Details

To define the populations, let $x = (X_1, \ldots, X_p)'$ be a multivariate uniformly distributed random vector such that $X_j \sim U(a_j^{(k)}, b_j^{(k)})$ is an independently distributed uniform random variable with $a_j^{(k)} < b_j^{(k)}$ for $j = 1, \ldots, p$.

For each population, we set the mean of the distribution along one feature to $\Delta$, while the remaining features have mean 0. The objective is to have unit hypercubes with $p = K_0$ where the population centroids separate from each other in orthogonal directions as $\Delta$ increases, with no overlap for $\Delta \ge 1$.

Hence, let $(a_k^{k}, b_k^{(k)}) = c(\Delta - 1/2, \Delta + 1/2)$. For the remaining ordered pairs, let $(a_j^{(k)}, b_j^{(k)}) = (-1/2, 1/2)$.

We generate $n_k$ observations from $k$th population.

For $\Delta = 0$, the $K_0 = 5$ populations are equal.

Notice that the support of each population is a unit hypercube with $p = K_0$ features. Moreover, for $\Delta \ge 1$, the populations are mutually exclusive and entirely separated.

Examples

Run this code

data_generated <- simdata_uniform(seed = 42)
dim(data_generated$x)
table(data_generated$y)

data_generated2 <- simdata_uniform(n = 10 * seq_len(5), delta = 1.5)
table(data_generated2$y)
sample_means <- with(data_generated2,
                     tapply(seq_along(y), y, function(i) {
                            colMeans(x[i,])
                     }))
(sample_means <- do.call(rbind, sample_means))

Run the code above in your browser using DataLab