Note that while fixmahal
has lots of parameters, only one (or
few) of them have usually to be specified, cf. the examples. The
philosophy is to allow much flexibility, but to always provide
sensible defaults.
fixmahal(dat, n = nrow(as.matrix(dat)), p = ncol(as.matrix(dat)),
method = "fuzzy", cgen = "fixed",
ca = NA, ca2 = NA,
calpha = ifelse(method=="fuzzy",0.95,0.99),
calpha2 = 0.995,
pointit = TRUE, subset = n,
nc1 = 100+20*p,
startn = 18+p, mnc = floor(startn/2),
mer = ifelse(pointit,0.1,0),
distcut = 0.85, maxit = 5*n, iter = n*1e-5,
init.group = list(),
ind.storage = TRUE, countmode = 100,
plot = "none")
## S3 method for class 'mfpc':
summary(object, ...)
## S3 method for class 'summary.mfpc':
print(x, maxnc=30, ...)
## S3 method for class 'mfpc':
plot(x, dat, no, bw=FALSE, main=c("Representative FPC No. ",no),
xlab=NULL, ylab=NULL,
pch=NULL, col=NULL, ...)
## S3 method for class 'mfpc':
fpclusters(object, dat=NA, ca=object$ca, p=object$p, ...)
fpmi(dat, n = nrow(as.matrix(dat)), p = ncol(as.matrix(dat)),
gv, ca, ca2, method = "ml", plot,
maxit = 5*n, iter = n*1e-6)
fpclusters.rfpc
does not need specification of dat
if fixmahal
has been run withmethod="classical"
means 0-1 weighting
of observations by Mahalanobis distances and use of the classical
normal covariance estimator. method="ml"
uses the
ML-covariance estimator (division by n
"fixed"
means that the same tuning
constant ca
is used for all iterations. "auto"
means
that ca
is generated dependently on the size of the current data
subset in each iteratcalpha
-quantile of the
chisquared distribution with p
degrees of freedom.method="fuzzy"
.
By default determined as calpha2
-quantile of the
chisquared distribution with p
degrees of freedom.ca
.calpha
.
See ca2
.TRUE
, subset
fixed point
algorithms are started from initial configurations, which are built
around single points of the dataset, cf. mahalconf
.
n
.
Initial configurations for the fixed point algorithm
(cf. mahalconf
) are built from
a subset of subset
points from cmahal
to generate ca
automatically. Only
needed for cgen="auto"
.mer
. This holds
under pointit=
distcut
are computed.
A single representative FPC is seiter
. Only needed
for method="fuzzy"
.n
.
Every vector indicates a starting configuration for the fixed
point algorithm. This can be used for datasets with high
dimension, where the vectors of init.group
indicTRUE
,
then all indicator
vectors of found FPCs are given in the value of fixmahal
.
May need lots of memory, but is a bit faster.countmode
algorithm runs fixmahal
shows a message."start"
, you get a scatterplot
of the first two variables to highlight the initial configuration,
"iteration"
generates such a plot at each iteration,
"both"
does both (this may be vermfpc
, output of fixmahal
.mfpc
, output of fixmahal
.TRUE
, plot is black/white,
FPC is
indicated by different symbol. Else FPC is indicated red.NULL
, a default text is used.NULL
, a default text is used.par
.
If NULL
, the default is used.par
.
If NULL
, the default is used.method="fuzzy"
,
vector of weights between 0 and 1) of length n
.
Indicates the initial
configuration for the fixed point algorithm.plot
(no effects elsewhere).fixmahal
returns an object of class mfpc
. This is a list
containing the components nc, g, means, covs, nfound, er, tsc,
ncoll, skc, grto, imatrix, smatrix, stn, stfound, ser, sfpc, ssig,
sto, struc, n, p, method, cgen, ca, ca2, cvec, calpha, pointit,
subset, mnc, startn, mer, distcut
. summary.mfpc
returns an object of class summary.mfpc
.
This is a list containing the components means, covs, stn,
stfound, sn, ser, tskip, skc, tsc, sim, ca, ca2, calpha, mer, method,
cgen, pointit
.
fpclusters.mfpc
returns a list of indicator vectors for the
representative FPCs of stable groups.
fpmi
returns a list with the components mg, covg, md,
gv, coll, method, ca
.
FALSE
if ind.storage=FALSE
.summary.mfpc
, only for representative
FPCs of stable groups and sorted according to
ser
.summary.mfpc
, only for representative
FPCs of stable groups and sorted according to
ser
.pointit=TRUE
,
this can be taken as a measure of stability of FPCs.init.group
.sseg
.sseg
.summary.mfpc
, sorted according to ser
.summary.mfpc
, sorted according to
ser
.pointit=TRUE
,
this can be taken as a measure of stability of FPCs. In
summary.mfpc
, sorted from largest to smallest.stn
.
Numbers of representative FPCs of the stable groups.ser
.cgen
has been "fixed"
. Else
numerical vector of length nc
(see below), giving the
final values of ca
for all FPC. In fpmi
, tuning
constant for the iterated FPC.n
for
cgen="auto"
. The values for the
tuning constant ca
corresponding to the cluster sizes from
1
to n
.sseg
.method="fuzzy"
)
indicator vector of iterated FPC.TRUE
means that singular covariance
matrices occurred during the iterations.ca
.
Fixed points of this operation can be considered as clusters,
because they contain only
non-outliers (as defined by the above mentioned procedure) and all other
points are outliers w.r.t. the subset.
The current default is to compute fuzzy Mahalanobis FPCs, where the
points in the subset have a membership weight between 0 and 1 and give
rise to weighted means and covariance matrices.
The new weights are then obtained by computing the weight function
wfu
of the squared Mahalanobis distances, i.e.,
full weight for squared distances smaller than ca
, zero
weight for squared distances larger than ca2
and
decreasing weights (linear function of squared distances)
in between.
A fixed point algorithm is started from the whole dataset,
algorithms are started from the subsets specified in
init.group
, and further algorithms are started from further
initial configurations as explained under subset
and in the
function mahalconf
.
Usually some of the FPCs are unstable, and more than one FPC may
correspond to the same significant pattern in the data. Therefore the
number of FPCs is reduced: A similarity matrix is computed
between FPCs. Similarity between sets is defined as the ratio between
2 times size of
intersection and the sum of sizes of both sets. The Single Linkage
clusters (groups)
of level distcut
are computed, i.e. the connectivity
components of the graph where edges are drawn between FPCs with
similarity larger than distcut
. Groups of FPCs whose members
are found often enough (cf. parameter mer
) are considered as
stable enough. A representative FPC is
chosen for every Single Linkage cluster of FPCs according to the
maximum expectation ratio ser
. ser
is the ratio between
the number of findings of an FPC and the number of points
of an FPC, adjusted suitably if subset.
Usually only the representative FPCs of stable groups
are of interest.
Default tuning constants are taken from Hennig (2005).
Generally, the default settings are recommended for
fixmahal
. For large datasets, the use of
init.group
together with pointit=FALSE
is useful. Occasionally, mnc
and startn
may be chosen
smaller than the default,
if smaller clusters are of interest, but this may lead to too many
clusters. Decrease of
ca
will often lead to too many clusters, even for homogeneous
data. Increase of ca
will produce only very strongly
separated clusters. Both may be of interest occasionally.
Singular covariance matrices during the iterations are handled by
solvecov
. summary.mfpc
gives a summary about the representative FPCs of
stable groups.
plot.mfpc
is a plot method for the representative FPC of stable
group no. no
. It produces a scatterplot, where
the points belonging to the FPC are highlighted, the mean is and
for p<3< code=""> also the region of the FPC is shown. For p>=3
,
the optimal separating projection computed by batcoord
is shown. 3<>
fpclusters.mfpc
produces a list of indicator vectors for the
representative FPCs of stable groups.
fpmi
is called by fixmahal
for a single fixed point
algorithm and will usually not be executed alone.
Hennig, C. (2005) Fuzzy and Crisp Mahalanobis Fixed Point Clusters,
in Baier, D., Decker, R., and Schmidt-Thieme, L. (eds.):
Data Analysis and Decision Support. Springer, Heidelberg,
47-56,
Hennig, C. and Christlieb, N. (2002) Validating visual clusters in large datasets: Fixed point clusters of spectral features, Computational Statistics and Data Analysis 40, 723-739.
fixreg
for linear regression fixed point clusters.mahalconf
, wfu
, cmahal
for computation of initial configurations, weights, tuning constants.
sseg
for indexing the similarity/intersection vectors
computed by fixmahal
.
batcoord
, cov.rob
, solvecov
,
cov.wml
, plotcluster
for computation of projections, (inverted)
covariance matrices, plotting.
rFace
for generation of example data, see below.
set.seed(20000)
face <- rFace(400,dMoNo=2,dNoEy=0, p=3)
# The first example uses grouping information via init.group.
initg <- list()
grface <- as.integer(attr(face,"grouping"))
for (i in 1:5) initg[[i]] <- (grface==i)
ff0 <- fixmahal(face, pointit=FALSE, init.group=initg)
summary(ff0)
cff0 <- fpclusters(ff0)
plot(face, col=1+cff0[[1]])
plot(face, col=1+cff0[[4]]) # Why does this come out as a cluster?
plot(ff0, face, 4) # A bit clearer...
# Without grouping information, examples need more time:
# ff1 <- fixmahal(face)
# summary(ff1)
# cff1 <- fpclusters(ff1)
# plot(face, col=1+cff1[[1]])
# plot(face, col=1+cff1[[6]]) # Why does this come out as a cluster?
# plot(ff1, face, 6) # A bit clearer...
# ff2 <- fixmahal(face,method="ml")
# summary(ff2)
# ff3 <- fixmahal(face,method="ml",calpha=0.95,subset=50)
# summary(ff3)
## ...fast, but lots of clusters. mer=0.3 may be useful here.
# set.seed(3000)
# face2 <- rFace(400,dMoNo=2,dNoEy=0)
# ff5 <- fixmahal(face2)
# summary(ff5)
## misses right eye of face data; with p=6,
## initial configurations are too large for 40 point clusters
# ff6 <- fixmahal(face2, startn=30)
# summary(ff6)
# cff6 <- fpclusters(ff6)
# plot(face2, col=1+cff6[[3]])
# plot(ff6, face2, 3)
# x <- c(1,2,3,6,6,7,8,120)
# ff8 <- fixmahal(x)
# summary(ff8)
# ...dataset a bit too small for the defaults...
# ff9 <- fixmahal(x, mnc=3, startn=3)
# summary(ff9)
Run the code above in your browser using DataLab