Learn R Programming

sortinghat (version 0.1)

simdata_uniform: Generates random variates from multivariate uniform populations.

Description

We generate $n$ observations from each of $K_0$ multivariate uniform distributions such that the Euclidean distance between each of the populations and the origin is equal and scaled by $\Delta \ge 0$.

Usage

simdata_uniform(n = rep(25, 5), delta = 0, seed = NULL)

Arguments

n
a vector (of length $K_0$) of the sample sizes for each population
delta
the fixed distance between each population and the origin
seed
seed for random number generation. (If NULL, does not set seed)

Value

  • named list containing: [object Object],[object Object]

Details

To define the populations, let $x = (X_1, \ldots, X_p)'$ be a multivariate uniformly distributed random vector such that $X_j \sim U(a_j^{(k)}, b_j^{(k)})$ is an independently distributed uniform random variable with $a_j^{(k)} < b_j^{(k)}$ for $j = 1, \ldots, p$.

For each population, we set the mean of the distribution along one feature to $\Delta$, while the remaining features have mean 0. The objective is to have unit hypercubes with $p = K_0$ where the population centroids separate from each other in orthogonal directions as $\Delta$ increases, with no overlap for $\Delta \ge 1$.

Hence, let $(a_k^{k}, b_k^{(k)}) = c(\Delta - 1/2, \Delta + 1/2)$. For the remaining ordered pairs, let $(a_j^{(k)}, b_j^{(k)}) = (-1/2, 1/2)$.

We generate $n_k$ observations from $k$th population.

For $\Delta = 0$, the $K_0 = 5$ populations are equal.

Notice that the support of each population is a unit hypercube with $p = K_0$ features. Moreover, for $\Delta \ge 1$, the populations are mutually exclusive and entirely separated.

Examples

Run this code
data_generated <- simdata_uniform(seed = 42)
dim(data_generated$x)
table(data_generated$y)

data_generated2 <- simdata_uniform(n = 10 * seq_len(5), delta = 1.5)
table(data_generated2$y)
sample_means <- with(data_generated2,
                     tapply(seq_along(y), y, function(i) {
                            colMeans(x[i,])
                     }))
(sample_means <- do.call(rbind, sample_means))

Run the code above in your browser using DataLab