Define the \(complexity\) of a finite continuous mixture \(F\) as the smallest integer \(p\), such that its probability density function (pdf) \(f\) can be written as
$$f(x) = w_1*g(x;\theta_1) + \dots + w_p*g(x;\theta_p).$$
Further, let \(g, f\) be two probability density functions. The squared Hellinger distance between \(g\) and \(f\) is given by
$$H^2(g,f) = \int (\sqrt{g(x)}-\sqrt{f(x)})^2 = 2 - 2\int\sqrt{f(x)}\sqrt{g(x)},$$
where \(\sqrt{g(x)}\), respectively \(\sqrt{f(x)}\) denotes the square root of the probability density functions at point x.
To estimate \(p\), hellinger.cont
iteratively increases the assumed complexity \(j\) and finds the "best" estimate for both, the pdf of a mixture with \(j\) and \(j+1\) components, ideally by calculating the parameters that minimize the sum of squared Hellinger distances to a kernel density estimate evaluated at each point. Since the computational burden of optimizing over an integral to find the "best" component weights and parameters is immense, the algorithm approximates the objective function by sampling sample.n
observations \(Y_i\) from the kernel density estimate and using
$$2 - 2\sum \sqrt{f(Y_i)} / \sqrt{g(Y_i)},$$
instead, with \(f\) being the mixture density and \(g\) being the kernel density estimate. Once the "best" parameters have been obtained, the difference in squared distances is compared to a predefined threshold
. If this difference is smaller than the threshold, the algorithm terminates and the true complexity is estimated as \(j\), otherwise \(j\) is increased by 1 and the procedure is started over. The predefined thresholds are the "AIC"
given by
$$(d+1)/n$$
and the "SBC"
given by
$$(d+1)log(n)/(2n),$$
\(n\) being the sample size and \(d\) the number of component parameters, i.e. \(\theta\) is in \(R^d\). Note that, if a customized function is to be used, it may only take the arguments j
and n
, so if the user wants to include the number of component parameters \(d\), it has to be entered explicitly.
hellinger.boot.cont
works similarly to hellinger.cont
with the exception that the difference in squared distances is not compared to a predefined threshold but a value generated by a bootstrap procedure. At every iteration (\(j\)), the function sequentially tests \(p = j\) versus \(p = j+1\) for \(j = 1,2, \dots\), using a parametric bootstrap to generate B
samples of size \(n\) from a \(j\)-component mixture given the previously calculated "best" parameter values. For each of the bootstrap samples, again the "best" estimates corresponding to pdfs with \(j\) and \(j+1\) components are calculated, as well as their difference in approximated squared Hellinger distances from the kernel density estimate. The null hypothesis \(H_0: p = j\) is rejected and \(j\) increased by 1 if the original difference in squared distances lies outside of the interval \([ql, qu]\), specified by the ql
and qu
empirical quantiles of the bootstrapped differences. Otherwise, \(j\) is returned as the complexity estimate.
To calculate the minimum of the Hellinger distance (and the corresponding parameter values), the solver solnp
is used. The initial values supplied to the solver are calculated as follows: the data is clustered into \(j\) groups by the function clara
and the data corresponding to each group is given to MLE.function
(if supplied to the datMix
object obj
, otherwise numerical optimization is used here as well). The size of the groups is taken as initial component weights and the MLE's are taken as initial component parameter estimates.