k
clusters.
fanny(x, k, diss = inherits(x, "dist"), memb.exp = 2, metric = c("euclidean", "manhattan", "SqEuclidean"), stand = FALSE, iniMem.p = NULL, cluster.only = FALSE, keep.diss = !diss && !cluster.only && n < 100, keep.data = !diss && !cluster.only, maxit = 500, tol = 1e-15, trace.lev = 0)
diss
argument.In case of a matrix or data frame, each row corresponds to an observation, and each column corresponds to a variable. All variables must be numeric. Missing values (NAs) are allowed.
In case of a dissimilarity matrix, x
is typically the output
of daisy
or dist
. Also a vector of
length n*(n-1)/2 is allowed (where n is the number of observations),
and will be interpreted in the same way as the output of the
above-mentioned functions. Missing values (NAs) are not allowed.
dist
or
dissimilarity
objects), then x
is assumed to be a
dissimilarity matrix. If FALSE, then x
is treated as
a matrix of observations by variables.
2
which used to be hardwired
inside FANNY."euclidean"
(default), "manhattan"
, and
"SqEuclidean"
. Euclidean distances are root sum-of-squares
of differences, and manhattan distances are the sum of absolute
differences, and "SqEuclidean"
, the squared euclidean
distances are sum-of-squares of differences. Using this last option is
equivalent (but somewhat slower) to computing so called fuzzy C-means.
If x
is already a dissimilarity matrix, then this argument will
be ignored.
x
are
standardized before calculating the dissimilarities. Measurements
are standardized for each variable (column), by subtracting the
variable's mean value and dividing by the variable's mean absolute
deviation. If x
is already a dissimilarity matrix, then this
argument will be ignored.NULL
(by default); can be used to specify a starting membership
matrix, i.e., a matrix of non-negative numbers, each row summing to
one.
x
should be kept in the result. Setting
these to FALSE
can give smaller results and hence also save
memory allocation time.maxit = 500
and tol =
1e-15
used to be hardwired inside the algorithm.0
does not print anything; higher values print
increasingly more."fanny"
representing the clustering.
See fanny.object
for details.
The memberships are nonnegative, and for a fixed observation i they sum to 1.
The particular method fanny
stems from chapter 4 of
Kaufman and Rousseeuw (1990) (see the references in
daisy
) and has been extended by Martin Maechler to allow
user specified memb.exp
, iniMem.p
, maxit
,
tol
, etc.
Fanny aims to minimize the objective function
$$\sum_{v=1}^k
\frac{\sum_{i=1}^n\sum_{j=1}^n u_{iv}^r u_{jv}^r d(i,j)}{
2 \sum_{j=1}^n u_{jv}^r}$$
where $n$ is the number of observations, $k$ is the number of
clusters, $r$ is the membership exponent memb.exp
and
$d(i,j)$ is the dissimilarity between observations $i$ and $j$.
Note that $r -> 1$ gives increasingly crisper
clusterings whereas $r -> Inf$ leads to complete
fuzzyness. K&R(1990), p.191 note that values too close to 1 can lead
to slow convergence. Further note that even the default, $r = 2$
can lead to complete fuzzyness, i.e., memberships $u(i,v) == 1/k$. In that case a warning is signalled and the
user is advised to chose a smaller memb.exp
($=r$).
Compared to other fuzzy clustering methods, fanny
has the following
features: (a) it also accepts a dissimilarity matrix; (b) it is
more robust to the spherical cluster
assumption; (c) it provides
a novel graphical display, the silhouette plot (see
plot.partition
).
agnes
for background and references;
fanny.object
, partition.object
,
plot.partition
, daisy
, dist
.
## generate 10+15 objects in two clusters, plus 3 objects lying
## between those clusters.
x <- rbind(cbind(rnorm(10, 0, 0.5), rnorm(10, 0, 0.5)),
cbind(rnorm(15, 5, 0.5), rnorm(15, 5, 0.5)),
cbind(rnorm( 3,3.2,0.5), rnorm( 3,3.2,0.5)))
fannyx <- fanny(x, 2)
## Note that observations 26:28 are "fuzzy" (closer to # 2):
fannyx
summary(fannyx)
plot(fannyx)
(fan.x.15 <- fanny(x, 2, memb.exp = 1.5)) # 'crispier' for obs. 26:28
(fanny(x, 2, memb.exp = 3)) # more fuzzy in general
data(ruspini)
f4 <- fanny(ruspini, 4)
stopifnot(rle(f4$clustering)$lengths == c(20,23,17,15))
plot(f4, which = 1)
## Plot similar to Figure 6 in Stryuf et al (1996)
plot(fanny(ruspini, 5))
Run the code above in your browser using DataLab