meanshift(X, x, h)
ms.rep(X, x, h, plotms=1, thresh= 0.00000001, iter=100)
ms(X, h, subset, thr=0.001, scaled= TRUE, plotms=2, or.labels=NULL, ...)
ms.self.coverage(X, taumin=0.02, taumax=0.5, gridsize=25,
thr=0.001, scaled=TRUE, draw=1/3, cluster=FALSE, plot.type="o",
or.labels=NULL, print=FALSE, ...)1:n. This allows to run the iterative mean shift procedure only
from a subset of points (if unspecified, 1:n is used here,
i.e. each data point serves as a starting point).TRUE, distances are always measured to the
cluster to which an observation is assigned, rather than to the
nearest cluster.gridsize is large.ms:scaled=TRUE).names().Chen (1995) showed that, if the mean shift is computed iteratively, the resulting sequence of local means converges to a mode of the estimated density function. By assigning each data point to the mode to which it has converged, this turns into a clustering technique.
The concepts of coverage and self-coverage, which were originally introduced in the principal curve context, adapt straightforwardly to this setting.
The goodness-of-fit messure Rc can also be applied in this context. For
instance, a value of $R_C=0.8$ means that,
after the clustering, the mean absolute residual length has been
reduced by $80%$ (compared to the distances to the overall mean).
Einbeck, J. (2010). Bandwidth selection for mean-shift based unsupervised learning techniques: a unified approach via self-coverage. Working paper, Durham University.
Rc, lpc.self.coveragedata(faithful)
foo <- ms.self.coverage(faithful,gridsize= 10, taumin=0.1, taumax=0.5,
plot.type="o") # need higher gridsizes in practice!
h <- select.self.coverage(foo)$select
fit <- ms(faithful,h=h[1])
coverage(fit$data, fit$cluster.center)
Rc(fit$data, fit$cluster.center[fit$closest.label,], type="points")Run the code above in your browser using DataLab