This function uses a heuristic formula to determine the optimal parameter values gamma and p, in the case when a Gaussian kernel is used. This formula is of the form \(gamma = K1 * |f|_2^{2/(d+2)} * n^{1/(d+2)}\) and
\(p = ceil(K2 * |f|_2^{2/(d+2)} * n^{2/(d+2)} )\), where \(|f|_2\) is the L2-norm of the density function of non-outliers \(f\) and \(ceil(x)\) denotes the smallest integer larger than \(x\).
Two methods are proposed to estimate \(|f|_2\) and are specified by the argument which.estim: "Gauss" and "general".
If which.estim="Gauss", the estimation is done as though \(f\) was a Gaussian density, which yields \(|f|_2^{2/(d+2)} ) = (4*pi)^{-0.5}*exp(0.5*mean(log(1/ev)))\), where \(ev\) are the covariance eigenvalues of the non-outlier distribution. Note that the eigenvalues smaller than \(ev[1]*RATIO\) (where \(ev[1]\) is the largest eigenvalue) are discarded to avoid numerical issues.
If which.estim="general", \(|f|_2\) is estimated without any assumption on \(f\). However this method may fail in very high dimensions because of the dimensionality curse, since it relies on an estimation of the derivative of \(F\) at \(0\) where \(F\) is the cdf of the pairwise distance between two non-outliers. . Besides, to shorten the computation time, the optional argument 'randomize' can be set as TRUE, so that only a subset of size sub.n of the data is considered to estimate the cdf \(F\).