Simulates several 2D datasets together with fuzzy cluster memberships \(U\). The memberships are defined by analytic density/curve proximity rules (detailed below) so they can be used as "ground truth" for fuzzy clustering and visualization (see plot.fuzzy).
gen.fuzzy(n = 500,
dataset = c("gaussian", "hyperbolas", "spirals",
"wedges", "rings", "worms", "random"),
k = NULL,
noise = 0.1,
covType = c("spherical", "diagonal", "rotated", "correlated"),
seed = NULL)A list with components:
An \(n \times 2\) numeric matrix of observations.
An \(n \times k\) matrix of probabilistic/fuzzy cluster memberships.
A vector length \(n\) of integers corresponding to hard cluster labels.
Number of clusters.
Returned only for dataset="random": the centres, cluster sizes, and common covariance used.
Total number of observations.
Which data generator to use, with options "gaussian", "hyperbolas", "spirals", "wedges", "rings", "worms", or "random".
Number of clusters for dataset="random", ignored otherwise; if NULL, defaults to k = 20.
Additive noise or curve-thickness parameter for applicable generators (see Details).
Covariance structure for dataset="random"; one of "spherical", "diagonal", "rotated", "correlated".
Optional seed for reproducibility.
The noise argument is used by "gaussian", "hyperbolas", "spirals", "rings", and "worms"; it is ignored by "wedges".
Let \(X \in \mathbb{R}^{n \times 2}\) be the simulated observations and \(U \in \mathbb{R}^{n \times k}\) the fuzzy memberships. For each dataset, memberships are defined below and row-normalized to sum to 1.
gaussian (k = 3). Three Gaussian components with means \((-2,0)\), \((2,0)\), \((0,3)\) and covariances \(([1,0.3];[0.3,1])\), \(([1, -0.3], [-0.3,1])\) and \(([0.8,0];[0,0.8])\). If component sizes are \(\pi_j\), then \(U_{ij} \propto \pi_j \phi_2(x_i | \mu_j, \Sigma_j)\).
hyperbolas (k = 5). One Gaussian near \((0,0)\) and four hyperbola branches \(\{(x,y): (x\pm a)^2/b^2 - (y)^2/a^2=1\}\) and its rotated or flipped analogues, sampled along \(t \in [-2,2]\) with noise. For observation \(x_i\), \(w_{\text{ball}} = 50 \cdot \phi_2(x_i | (0,0), 0.2I_2)\), and \(w_{\text{hyp},\ell} = \exp \big(-d^2(x_i,\mathcal{C}_\ell)/(\sigma^2)\big)\), where \(d(\cdot,\mathcal{C}_\ell)\) is minimum distance to branch \(\ell\) for curve \(\mathcal{C}\). We set \(U_{i\cdot} \propto w\).
spirals (k = 3). Three spirals generated by \((r,\theta) \mapsto (x,y) = ((0.5+0.8t)\cos(\theta_s+t),(0.5+0.8t)\sin(\theta_s+t))\) with shifts \(\theta_s \in \{0,2\pi/3,4\pi/3\}\), with additive noise. For each spiral \(s\), \(d_s = \min_{t \in [0,\pi]} \|x_i - \gamma_s(t)\|\), where \(\gamma_s(t)\) is the parameterized spiral curve described above, and \(U_{is} \propto \exp \big(-d_s^2/\sigma^2)\big)\). Note, if \(\|x_i\| < 1\), set \(U_{i\cdot} \leftarrow (1-\alpha)U_{i\cdot} + \alpha(1,1,1)/3\) with \(\alpha = 0.5e^{-\|x_i\|}\) and normalize after.
wedges (k = 8). Eight angular wedges with inner/outer radii \(1\) and \(4\), respectively, with small gaps between wedges. For observation \(x_i\) with radius \(r\) and angle \(\theta\), membership to wedge \(j\) is \(U_{ij} \propto \exp \big(-\delta(\theta,\theta_j)^2/\sigma^2\big)\), where \(\delta\) is a wrapped angular distance to the wedge centre angle \(\theta_j\).
rings (k = 3). For \(x_i \in \mathbb{R}^2\) and \(r_i = \|x_i\|_2\), there are three concentric rings with radii \(R_j \in \{1, 2.5, 4\}\) with widths \(W_j \in \{0.3, 0.4, 0.5\}\) for \(j=1,2,3\). Let \(w_{ij} = \exp\left(-(r_i - R_j)^2/W_j^2\right)\), then \( U_{ij} = w_{ij}/\sum_{\ell=1}^{3} w_{i\ell}\).
worms (k = 4). Each worm \(j\) is a sinusoidal curve parameterized on \(t \in [0,2\pi]\) by \( \gamma_j(t) = \big(x(t), y_j(t)\big)\) with \(x(t)=2(t-\pi)\), \( y_j(t) = A_j \sin(f_j t + \phi_j) + y^{\mathrm{off}}_j\), with amplitudes \(A_j\), frequencies \(f_j\), phases \(\phi_j\), and vertical offsets \(y^{\mathrm{off}}_j\). For observation \(x_i \in \mathbb{R}^2\), the distance to worm \(j\) is \(d_j(x_i) = \min_{t \in [0,2\pi]} \|x_i - \gamma_j(t)\|_2,\). Then \(w_{ij} = \exp\left( -d_j(x_i)^2/\sigma^2\right)\), and \( U_{ij} = w_{ij}/\sum_{\ell=1}^{4} w_{i\ell}\).
random (k is user-specified).
Mixture of \(k\) Gaussians with common covariance determined by covType with random centres in \([0,30]^2\) and random cluster sizes. With mixture weights \(\pi_j\),
\(U_{ij} \propto \pi_j \phi_2(x_i | \mu_j, \Sigma).\)
plot.fuzzy
set.seed(1)
g <- gen.fuzzy(n = 600, dataset = "gaussian", seed = 1)
plot.fuzzy(g, plotFuzzy = TRUE, colorCluster = TRUE)
s <- gen.fuzzy(n = 450, dataset = "spirals", noise = 0.2, seed = 1)
plot.fuzzy(s, plotFuzzy = TRUE, colorCluster = FALSE)
r <- gen.fuzzy(n = 800, dataset = "random", k = 15, covType = "rotated", seed = 1)
plot.fuzzy(r, plotFuzzy = TRUE, colorCluster = TRUE)
Run the code above in your browser using DataLab