Learn R Programming

ICSClust (version 0.1.1)

mixture_sim: Simulation of a mixture of Gaussian distributions

Description

Simulation of a \(n \times p\) data frame according to a mixture of \(q\) Gaussian distributions with \(q < p\), different location parameters \(\mu_1, \dots, \mu_q\), and the identity matrix as the covariance matrix.

Usage

mixture_sim(pct_clusters = c(0.5, 0.5), n = 500, p = 10, delta = 10)

Value

A dataframe of n observations and p+1 variables with the first variable indicating the cluster assignment using a character string.

Arguments

pct_clusters

a vector of marginal probabilities for each group, i.e mixture weights. Default is two balanced clusters.

n

integer. The number of observations.

p

integer. The number of variables.

delta

integer. The location shift.

Author

Aurore Archimbaud

Details

Let \(X\) be a \(p\)-variate real random vector distributed according to a mixture of \(q\) Gaussian distributions with \(q < p\), different location parameters \(\mu_1, \dots, \mu_q\), and the same positive definite covariance matrix \(I_p\): $$X \sim \sum_{h=1}^{q} \epsilon_h \, {\cal N}(\mu_h,I_p),$$ where \(\epsilon_{1}, \dots, \epsilon_{q}\) are mixture weights with \(\epsilon_1 + \cdots + \epsilon_q = 1\), \(\mu_1 = 0_p\), and \(\mu_{h+1} = \delta e_h\) with \(h = 1, \dots, q-1\).

References

Alfons, A., Archimbaud, A., Nordhausen, K., & Ruiz-Gazen, A. (2024). Tandem clustering with invariant coordinate selection. Econometrics and Statistics. tools:::Rd_expr_doi("10.1016/j.ecosta.2024.03.002").

Examples

Run this code
X <- mixture_sim()
summary(X)

Run the code above in your browser using DataLab