ms
which, for a given
bandwidth, detects the local modes (`local principal points') and performs the clustering.These functions implement the techniques presented in Einbeck (2011).
meanshift(X, x, h)
ms.rep(X, x, h, plotms=1, thresh= 0.00000001, iter=200)
ms(X, h, subset, thr=0.0001, scaled= TRUE, iter=200, plotms=2,
or.labels=NULL, ...)
ms.self.coverage(X, taumin=0.02, taumax=0.5, gridsize=25,
thr=0.0001, scaled=TRUE, cluster=FALSE, plot.type="o",
or.labels=NULL, print=FALSE, ...)
x
) falls below
thresh
, or after iter
iterations (whatever event
happens first).TRUE
, distances are always measured to the
cluster to which an observation is assigned, rather than to the
nearest cluster.gridsize
is large.ms
produces an object of class ms
,
with components:scaled=TRUE
).names()
.Chen (1995) showed that, if the mean shift is computed iteratively, the resulting sequence of local means converges to a mode of the estimated density function. By assigning each data point to the mode to which it has converged, this turns into a clustering technique.
The concepts of coverage and self-coverage, which were originally introduced in the principal curve context, adapt straightforwardly to this setting.
The goodness-of-fit messure Rc
can also be applied in this context. For
instance, a value of $R_C=0.8$ means that,
after the clustering, the mean absolute residual length has been
reduced by $80%$ (compared to the distances to the overall mean).
Einbeck, J. (2011). Bandwidth selection for mean-shift based unsupervised learning techniques: a unified approach via self-coverage. Journal of Pattern Recognition Research 6, 175-192.
Rc
, lpc.self.coverage
data(faithful)
foo <- ms.self.coverage(faithful,gridsize= 10, taumin=0.1, taumax=0.5,
plot.type="o") # need higher gridsizes in practice!
h <- select.self.coverage(foo)$select
fit <- ms(faithful,h=h[1])
coverage(fit$data, fit$cluster.center)
Run the code above in your browser using DataLab