Simulation of a \(n \times p\) data frame according to a mixture of \(q\) Gaussian distributions with \(q < p\), different location parameters \(\mu_1, \dots, \mu_q\), and the identity matrix as the covariance matrix.
mixture_sim(pct_clusters = c(0.5, 0.5), n = 500, p = 10, delta = 10)A dataframe of n observations and p+1 variables with the first variable indicating the cluster assignment using a character string.
a vector of marginal probabilities for each group, i.e mixture weights. Default is two balanced clusters.
integer. The number of observations.
integer. The number of variables.
integer. The location shift.
Aurore Archimbaud
Let \(X\) be a \(p\)-variate real random vector distributed according to a mixture of \(q\) Gaussian distributions with \(q < p\), different location parameters \(\mu_1, \dots, \mu_q\), and the same positive definite covariance matrix \(I_p\): $$X \sim \sum_{h=1}^{q} \epsilon_h \, {\cal N}(\mu_h,I_p),$$ where \(\epsilon_{1}, \dots, \epsilon_{q}\) are mixture weights with \(\epsilon_1 + \cdots + \epsilon_q = 1\), \(\mu_1 = 0_p\), and \(\mu_{h+1} = \delta e_h\) with \(h = 1, \dots, q-1\).
Alfons, A., Archimbaud, A., Nordhausen, K., & Ruiz-Gazen, A. (2024). Tandem clustering with invariant coordinate selection. Econometrics and Statistics. tools:::Rd_expr_doi("10.1016/j.ecosta.2024.03.002").