It is generally recommended to standardize X so that each variable has
unit variance prior to running the algorithm on the data.
Roughly speaking, larger values of h produce a coarser clustering (i.e. few and large clusters). For sufficiently large values of h, the algorithm produces a unique cluster containing all the data points. Smaller values of h produce a finer clustering (i.e. many small clusters). For sufficiently small values of h, each cluster that is identified by the algorithm will contain exactly one data point.
If h is not specified in the function call, then h is by default set to the 30th percentile of the empirical distribution of distances between the columns of X, i.e. h=quantile( dist( t( X ) ), 0.3 ).
In their implementation, gaussianKernel and exponentialKernel are rescaled to assign probability of at least 0.99 to the unit interval $[0,1]$. This ensures that all the kernels are roughly on the same scale.
When using the blurring version of the mean shift algorithm, it is generally recommended to use a compactly supported kernel. In particular, the algorithm is guaranteed to converge in finitely many iterations with the Epanechnikov kernel.