It is generally recommended to standardize X
so that each variable has
unit variance prior to running the algorithm on the data.
Roughly speaking, larger values of h
produce a coarser clustering (i.e. few and large clusters). For sufficiently large values of h
, the algorithm produces a unique cluster containing all the data points. Smaller values of h
produce a finer clustering (i.e. many small clusters). For sufficiently small values of h
, each cluster that is identified by the algorithm will contain exactly one data point.
If h
is not specified in the function call, then h
is by default set to the 30th percentile of the empirical distribution of distances between the columns of X
, i.e. h=quantile( dist( t( X ) ), 0.3 )
.
In their implementation, gaussianKernel
and exponentialKernel
are rescaled to assign probability of at least 0.99 to the unit interval $[0,1]$. This ensures that all the kernels are roughly on the same scale.
When using the blurring version of the mean shift algorithm, it is generally recommended to use a compactly supported kernel. In particular, the algorithm is guaranteed to converge in finitely many iterations with the Epanechnikov kernel.