Computes a fuzzy clustering of the data into `k`

clusters.

```
fanny(x, k, diss = inherits(x, "dist"), memb.exp = 2,
metric = c("euclidean", "manhattan", "SqEuclidean"),
stand = FALSE, iniMem.p = NULL, cluster.only = FALSE,
keep.diss = !diss && !cluster.only && n < 100,
keep.data = !diss && !cluster.only,
maxit = 500, tol = 1e-15, trace.lev = 0)
```

x

data matrix or data frame, or dissimilarity matrix, depending on the
value of the `diss`

argument.

In case of a matrix or data frame, each row corresponds to an observation, and each column corresponds to a variable. All variables must be numeric. Missing values (NAs) are allowed.

In case of a dissimilarity matrix, `x`

is typically the output
of `daisy`

or `dist`

. Also a vector of
length n*(n-1)/2 is allowed (where n is the number of observations),
and will be interpreted in the same way as the output of the
above-mentioned functions. Missing values (NAs) are not allowed.

k

integer giving the desired number of clusters. It is required that \(0 < k < n/2\) where \(n\) is the number of observations.

diss

logical flag: if TRUE (default for `dist`

or
`dissimilarity`

objects), then `x`

is assumed to be a
dissimilarity matrix. If FALSE, then `x`

is treated as
a matrix of observations by variables.

memb.exp

number \(r\) strictly larger than 1 specifying the
*membership exponent* used in the fit criterion; see the
‘Details’ below. Default: `2`

which used to be hardwired
inside FANNY.

metric

character string specifying the metric to be used for
calculating dissimilarities between observations. Options are
`"euclidean"`

(default), `"manhattan"`

, and
`"SqEuclidean"`

. Euclidean distances are root sum-of-squares
of differences, and manhattan distances are the sum of absolute
differences, and `"SqEuclidean"`

, the *squared* euclidean
distances are sum-of-squares of differences. Using this last option is
equivalent (but somewhat slower) to computing so called “fuzzy C-means”.
If `x`

is already a dissimilarity matrix, then this argument will
be ignored.

stand

logical; if true, the measurements in `x`

are
standardized before calculating the dissimilarities. Measurements
are standardized for each variable (column), by subtracting the
variable's mean value and dividing by the variable's mean absolute
deviation. If `x`

is already a dissimilarity matrix, then this
argument will be ignored.

iniMem.p

numeric \(n \times k\) matrix or `NULL`

(by default); can be used to specify a starting `membership`

matrix, i.e., a matrix of non-negative numbers, each row summing to
one.

cluster.only

logical; if true, no silhouette information will be computed and returned, see details.

keep.diss, keep.data

logicals indicating if the dissimilarities
and/or input data `x`

should be kept in the result. Setting
these to `FALSE`

can give smaller results and hence also save
memory allocation *time*.

maxit, tol

maximal number of iterations and default tolerance
for convergence (relative convergence of the fit criterion) for the
FANNY algorithm. The defaults `maxit = 500`

and ```
tol =
1e-15
```

used to be hardwired inside the algorithm.

trace.lev

integer specifying a trace level for printing
diagnostics during the C-internal algorithm.
Default `0`

does not print anything; higher values print
increasingly more.

an object of class `"fanny"`

representing the clustering.
See `fanny.object`

for details.

In a fuzzy clustering, each observation is “spread out” over the various clusters. Denote by \(u_{iv}\) the membership of observation \(i\) to cluster \(v\).

The memberships are nonnegative, and for a fixed observation i they sum to 1.
The particular method `fanny`

stems from chapter 4 of
Kaufman and Rousseeuw (1990) (see the references in
`daisy`

) and has been extended by Martin Maechler to allow
user specified `memb.exp`

, `iniMem.p`

, `maxit`

,
`tol`

, etc.

Fanny aims to minimize the objective function
$$\sum_{v=1}^k
\frac{\sum_{i=1}^n\sum_{j=1}^n u_{iv}^r u_{jv}^r d(i,j)}{
2 \sum_{j=1}^n u_{jv}^r}$$
where \(n\) is the number of observations, \(k\) is the number of
clusters, \(r\) is the membership exponent `memb.exp`

and
\(d(i,j)\) is the dissimilarity between observations \(i\) and \(j\).
Note that \(r \to 1\) gives increasingly crisper
clusterings whereas \(r \to \infty\) leads to complete
fuzzyness. K&R(1990), p.191 note that values too close to 1 can lead
to slow convergence. Further note that even the default, \(r = 2\)
can lead to complete fuzzyness, i.e., memberships \(u_{iv} \equiv
1/k\). In that case a warning is signalled and the
user is advised to chose a smaller `memb.exp`

(\(=r\)).

Compared to other fuzzy clustering methods, `fanny`

has the following
features: (a) it also accepts a dissimilarity matrix; (b) it is
more robust to the `spherical cluster`

assumption; (c) it provides
a novel graphical display, the silhouette plot (see
`plot.partition`

).

`agnes`

for background and references;
`fanny.object`

, `partition.object`

,
`plot.partition`

, `daisy`

, `dist`

.

# NOT RUN { ## generate 10+15 objects in two clusters, plus 3 objects lying ## between those clusters. x <- rbind(cbind(rnorm(10, 0, 0.5), rnorm(10, 0, 0.5)), cbind(rnorm(15, 5, 0.5), rnorm(15, 5, 0.5)), cbind(rnorm( 3,3.2,0.5), rnorm( 3,3.2,0.5))) fannyx <- fanny(x, 2) ## Note that observations 26:28 are "fuzzy" (closer to # 2): fannyx summary(fannyx) plot(fannyx) (fan.x.15 <- fanny(x, 2, memb.exp = 1.5)) # 'crispier' for obs. 26:28 (fanny(x, 2, memb.exp = 3)) # more fuzzy in general data(ruspini) f4 <- fanny(ruspini, 4) stopifnot(rle(f4$clustering)$lengths == c(20,23,17,15)) plot(f4, which = 1) ## Plot similar to Figure 6 in Stryuf et al (1996) plot(fanny(ruspini, 5)) # }