Learn R Programming

FuzzySpec (version 1.0.0)

gen.fuzzy: Generate 2D synthetic datasets with known fuzzy memberships

Description

Simulates several 2D datasets together with fuzzy cluster memberships \(U\). The memberships are defined by analytic density/curve proximity rules (detailed below) so they can be used as "ground truth" for fuzzy clustering and visualization (see plot.fuzzy).

Usage

gen.fuzzy(n = 500,
          dataset = c("gaussian", "hyperbolas", "spirals",
                      "wedges", "rings", "worms", "random"),
          k = NULL,
          noise = 0.1,
          covType = c("spherical", "diagonal", "rotated", "correlated"),
          seed = NULL)

Value

A list with components:

X

An \(n \times 2\) numeric matrix of observations.

U

An \(n \times k\) matrix of probabilistic/fuzzy cluster memberships.

y

A vector length \(n\) of integers corresponding to hard cluster labels.

k

Number of clusters.

centres, clusSz, covMatrix

Returned only for dataset="random": the centres, cluster sizes, and common covariance used.

Arguments

n

Total number of observations.

dataset

Which data generator to use, with options "gaussian", "hyperbolas", "spirals", "wedges", "rings", "worms", or "random".

k

Number of clusters for dataset="random", ignored otherwise; if NULL, defaults to k = 20.

noise

Additive noise or curve-thickness parameter for applicable generators (see Details).

covType

Covariance structure for dataset="random"; one of "spherical", "diagonal", "rotated", "correlated".

seed

Optional seed for reproducibility.

Notes

The noise argument is used by "gaussian", "hyperbolas", "spirals", "rings", and "worms"; it is ignored by "wedges".

Details

Let \(X \in \mathbb{R}^{n \times 2}\) be the simulated observations and \(U \in \mathbb{R}^{n \times k}\) the fuzzy memberships. For each dataset, memberships are defined below and row-normalized to sum to 1.

gaussian (k = 3). Three Gaussian components with means \((-2,0)\), \((2,0)\), \((0,3)\) and covariances \(([1,0.3];[0.3,1])\), \(([1, -0.3], [-0.3,1])\) and \(([0.8,0];[0,0.8])\). If component sizes are \(\pi_j\), then \(U_{ij} \propto \pi_j \phi_2(x_i | \mu_j, \Sigma_j)\).

hyperbolas (k = 5). One Gaussian near \((0,0)\) and four hyperbola branches \(\{(x,y): (x\pm a)^2/b^2 - (y)^2/a^2=1\}\) and its rotated or flipped analogues, sampled along \(t \in [-2,2]\) with noise. For observation \(x_i\), \(w_{\text{ball}} = 50 \cdot \phi_2(x_i | (0,0), 0.2I_2)\), and \(w_{\text{hyp},\ell} = \exp \big(-d^2(x_i,\mathcal{C}_\ell)/(\sigma^2)\big)\), where \(d(\cdot,\mathcal{C}_\ell)\) is minimum distance to branch \(\ell\) for curve \(\mathcal{C}\). We set \(U_{i\cdot} \propto w\).

spirals (k = 3). Three spirals generated by \((r,\theta) \mapsto (x,y) = ((0.5+0.8t)\cos(\theta_s+t),(0.5+0.8t)\sin(\theta_s+t))\) with shifts \(\theta_s \in \{0,2\pi/3,4\pi/3\}\), with additive noise. For each spiral \(s\), \(d_s = \min_{t \in [0,\pi]} \|x_i - \gamma_s(t)\|\), where \(\gamma_s(t)\) is the parameterized spiral curve described above, and \(U_{is} \propto \exp \big(-d_s^2/\sigma^2)\big)\). Note, if \(\|x_i\| < 1\), set \(U_{i\cdot} \leftarrow (1-\alpha)U_{i\cdot} + \alpha(1,1,1)/3\) with \(\alpha = 0.5e^{-\|x_i\|}\) and normalize after.

wedges (k = 8). Eight angular wedges with inner/outer radii \(1\) and \(4\), respectively, with small gaps between wedges. For observation \(x_i\) with radius \(r\) and angle \(\theta\), membership to wedge \(j\) is \(U_{ij} \propto \exp \big(-\delta(\theta,\theta_j)^2/\sigma^2\big)\), where \(\delta\) is a wrapped angular distance to the wedge centre angle \(\theta_j\).

rings (k = 3). For \(x_i \in \mathbb{R}^2\) and \(r_i = \|x_i\|_2\), there are three concentric rings with radii \(R_j \in \{1, 2.5, 4\}\) with widths \(W_j \in \{0.3, 0.4, 0.5\}\) for \(j=1,2,3\). Let \(w_{ij} = \exp\left(-(r_i - R_j)^2/W_j^2\right)\), then \( U_{ij} = w_{ij}/\sum_{\ell=1}^{3} w_{i\ell}\).

worms (k = 4). Each worm \(j\) is a sinusoidal curve parameterized on \(t \in [0,2\pi]\) by \( \gamma_j(t) = \big(x(t), y_j(t)\big)\) with \(x(t)=2(t-\pi)\), \( y_j(t) = A_j \sin(f_j t + \phi_j) + y^{\mathrm{off}}_j\), with amplitudes \(A_j\), frequencies \(f_j\), phases \(\phi_j\), and vertical offsets \(y^{\mathrm{off}}_j\). For observation \(x_i \in \mathbb{R}^2\), the distance to worm \(j\) is \(d_j(x_i) = \min_{t \in [0,2\pi]} \|x_i - \gamma_j(t)\|_2,\). Then \(w_{ij} = \exp\left( -d_j(x_i)^2/\sigma^2\right)\), and \( U_{ij} = w_{ij}/\sum_{\ell=1}^{4} w_{i\ell}\).

random (k is user-specified). Mixture of \(k\) Gaussians with common covariance determined by covType with random centres in \([0,30]^2\) and random cluster sizes. With mixture weights \(\pi_j\), \(U_{ij} \propto \pi_j \phi_2(x_i | \mu_j, \Sigma).\)

See Also

plot.fuzzy

Examples

Run this code
set.seed(1)

g <- gen.fuzzy(n = 600, dataset = "gaussian", seed = 1)
plot.fuzzy(g, plotFuzzy = TRUE, colorCluster = TRUE)

s <- gen.fuzzy(n = 450, dataset = "spirals", noise = 0.2, seed = 1)
plot.fuzzy(s, plotFuzzy = TRUE, colorCluster = FALSE)

r <- gen.fuzzy(n = 800, dataset = "random", k = 15, covType = "rotated", seed = 1)
plot.fuzzy(r, plotFuzzy = TRUE, colorCluster = TRUE)

Run the code above in your browser using DataLab