# fanny

##### Fuzzy Analysis Clustering

Computes a fuzzy clustering of the data into `k`

clusters.

- Keywords
- cluster

##### Usage

```
fanny(x, k, diss = inherits(x, "dist"), memb.exp = 2,
metric = c("euclidean", "manhattan", "SqEuclidean"),
stand = FALSE, iniMem.p = NULL, cluster.only = FALSE,
keep.diss = !diss && !cluster.only && n < 100,
keep.data = !diss && !cluster.only,
maxit = 500, tol = 1e-15, trace.lev = 0)
```

##### Arguments

- x
data matrix or data frame, or dissimilarity matrix, depending on the value of the

`diss`

argument.In case of a matrix or data frame, each row corresponds to an observation, and each column corresponds to a variable. All variables must be numeric. Missing values (NAs) are allowed.

In case of a dissimilarity matrix,

`x`

is typically the output of`daisy`

or`dist`

. Also a vector of length n*(n-1)/2 is allowed (where n is the number of observations), and will be interpreted in the same way as the output of the above-mentioned functions. Missing values (NAs) are not allowed.- k
integer giving the desired number of clusters. It is required that \(0 < k < n/2\) where \(n\) is the number of observations.

- diss
logical flag: if TRUE (default for

`dist`

or`dissimilarity`

objects), then`x`

is assumed to be a dissimilarity matrix. If FALSE, then`x`

is treated as a matrix of observations by variables.- memb.exp
number \(r\) strictly larger than 1 specifying the

*membership exponent*used in the fit criterion; see the ‘Details’ below. Default:`2`

which used to be hardwired inside FANNY.- metric
character string specifying the metric to be used for calculating dissimilarities between observations. Options are

`"euclidean"`

(default),`"manhattan"`

, and`"SqEuclidean"`

. Euclidean distances are root sum-of-squares of differences, and manhattan distances are the sum of absolute differences, and`"SqEuclidean"`

, the*squared*euclidean distances are sum-of-squares of differences. Using this last option is equivalent (but somewhat slower) to computing so called “fuzzy C-means”. If`x`

is already a dissimilarity matrix, then this argument will be ignored.- stand
logical; if true, the measurements in

`x`

are standardized before calculating the dissimilarities. Measurements are standardized for each variable (column), by subtracting the variable's mean value and dividing by the variable's mean absolute deviation. If`x`

is already a dissimilarity matrix, then this argument will be ignored.- iniMem.p
numeric \(n \times k\) matrix or

`NULL`

(by default); can be used to specify a starting`membership`

matrix, i.e., a matrix of non-negative numbers, each row summing to one.- cluster.only
logical; if true, no silhouette information will be computed and returned, see details.

- keep.diss, keep.data
logicals indicating if the dissimilarities and/or input data

`x`

should be kept in the result. Setting these to`FALSE`

can give smaller results and hence also save memory allocation*time*.- maxit, tol
maximal number of iterations and default tolerance for convergence (relative convergence of the fit criterion) for the FANNY algorithm. The defaults

`maxit = 500`

and`tol = 1e-15`

used to be hardwired inside the algorithm.- trace.lev
integer specifying a trace level for printing diagnostics during the C-internal algorithm. Default

`0`

does not print anything; higher values print increasingly more.

##### Details

In a fuzzy clustering, each observation is “spread out” over the various clusters. Denote by \(u_{iv}\) the membership of observation \(i\) to cluster \(v\).

The memberships are nonnegative, and for a fixed observation i they sum to 1.
The particular method `fanny`

stems from chapter 4 of
Kaufman and Rousseeuw (1990) (see the references in
`daisy`

) and has been extended by Martin Maechler to allow
user specified `memb.exp`

, `iniMem.p`

, `maxit`

,
`tol`

, etc.

Fanny aims to minimize the objective function
$$\sum_{v=1}^k
\frac{\sum_{i=1}^n\sum_{j=1}^n u_{iv}^r u_{jv}^r d(i,j)}{
2 \sum_{j=1}^n u_{jv}^r}$$
where \(n\) is the number of observations, \(k\) is the number of
clusters, \(r\) is the membership exponent `memb.exp`

and
\(d(i,j)\) is the dissimilarity between observations \(i\) and \(j\).
Note that \(r \to 1\) gives increasingly crisper
clusterings whereas \(r \to \infty\) leads to complete
fuzzyness. K&R(1990), p.191 note that values too close to 1 can lead
to slow convergence. Further note that even the default, \(r = 2\)
can lead to complete fuzzyness, i.e., memberships \(u_{iv} \equiv
1/k\). In that case a warning is signalled and the
user is advised to chose a smaller `memb.exp`

(\(=r\)).

Compared to other fuzzy clustering methods, `fanny`

has the following
features: (a) it also accepts a dissimilarity matrix; (b) it is
more robust to the `spherical cluster`

assumption; (c) it provides
a novel graphical display, the silhouette plot (see
`plot.partition`

).

##### Value

an object of class `"fanny"`

representing the clustering.
See `fanny.object`

for details.

##### See Also

`agnes`

for background and references;
`fanny.object`

, `partition.object`

,
`plot.partition`

, `daisy`

, `dist`

.

##### Examples

```
# NOT RUN {
## generate 10+15 objects in two clusters, plus 3 objects lying
## between those clusters.
x <- rbind(cbind(rnorm(10, 0, 0.5), rnorm(10, 0, 0.5)),
cbind(rnorm(15, 5, 0.5), rnorm(15, 5, 0.5)),
cbind(rnorm( 3,3.2,0.5), rnorm( 3,3.2,0.5)))
fannyx <- fanny(x, 2)
## Note that observations 26:28 are "fuzzy" (closer to # 2):
fannyx
summary(fannyx)
plot(fannyx)
(fan.x.15 <- fanny(x, 2, memb.exp = 1.5)) # 'crispier' for obs. 26:28
(fanny(x, 2, memb.exp = 3)) # more fuzzy in general
data(ruspini)
f4 <- fanny(ruspini, 4)
stopifnot(rle(f4$clustering)$lengths == c(20,23,17,15))
plot(f4, which = 1)
## Plot similar to Figure 6 in Stryuf et al (1996)
plot(fanny(ruspini, 5))
# }
```

*Documentation reproduced from package cluster, version 2.0.7-1, License: GPL (>= 2)*