A continuous elliptical distribution has a density of the form $$f_X(x) = {|\Sigma|}^{-1/2} g\left( (x-\mu)^\top \, \Sigma^{-1} \, (x-\mu) \right), $$ where \(x \in \mathbb{R}^d\), \(\mu \in \mathbb{R}^d\) is the mean, \(\Sigma\) is a \(d \times d\) positive-definite matrix and a function \(g: \mathbb{R}_+ \rightarrow \mathbb{R}_+\), called the density generator of \(X\). The goal is to estimate \(g\) at some point \(\xi\), by $$ \widehat{g}_{n,h,a}(\xi) := \dfrac{\xi^{\frac{-d+2}{2}} \psi_a'(\xi)}{n h s_d} \sum_{i=1}^n K\left( \dfrac{ \psi_a(\xi) - \psi_a(\xi_i) }{h} \right) + K\left( \dfrac{ \psi_a(\xi) + \psi_a(\xi_i) }{h} \right), $$ where \(s_d := \pi^{d/2} / \Gamma(d/2)\), \(\Gamma\) is the Gamma function, \(h\) and \(a\) are tuning parameters (respectively the bandwidth and a parameter controlling the bias at \(\xi = 0\)), \(\psi_a(\xi) := -a + (a^{d/2} + \xi^{d/2})^{2/d},\) \(\xi \in \mathbb{R}\), \(K\) is a kernel function and \(\xi_i := (X_i - \mu)^\top \, \Sigma^{-1} \, (X_i - \mu), \) for a sample \(X_1, \dots, X_n\). This function computes "optimal asymptotic" values for the bandwidth \(h\) and the tuning parameter \(a\) from a first step bandwidth that the user needs to provide.
EllDistrEst.adapt(
X,
mu = 0,
Sigma_m1 = diag(NCOL(X)),
grid,
h_firstStep,
grid_a = NULL,
Kernel = "gaussian",
mpfr = FALSE,
precBits = 100,
dopb = TRUE
)a list with the following elements:
g a vector of size n1 = length(grid).
Each component of this vector is an estimator of \(g(x[i])\)
where x[i] is the \(i\)-th element of the grid.
best_a a vector of the same size as grid indicating
for each value of the grid what is the optimal choice of \(a\) found by
our algorithm (which is used to estimate \(g\)).
best_h a vector of the same size as grid indicating
for each value of the grid what is the optimal choice of \(h\) found by
our algorithm (which is used to estimate \(g\)).
first_step_g first step estimator of g, computed using
the tuning parameters best_a and h_firstStep[2].
AMSE_estimated an estimator of the part of the asymptotic MSE
that only depends on \(a\).
a matrix of size \(n \times d\), assumed to be \(n\) i.i.d. observations (rows) of a \(d\)-dimensional elliptical distribution.
mean of X. This can be the true value or an estimate. It must be a vector of dimension \(d\).
inverse of the covariance matrix of X. This can be the true value or an estimate. It must be a matrix of dimension \(d \times d\).
vector containing the values at which we want the generator to be estimated.
a vector of size 2 containing first-step bandwidths
to be used. The first one is used for the estimation of the asymptotic mean-squared
error. The second one is used for the first step estimation of \(g\).
From these two estimators, a final value of the bandwidth \(h\) is determined,
which is used for the final estimator of \(g\).
If h_firstStep is of length 1, its value is reused for both purposes
(estimation of the AMSE and first-step estimation of \(g\)).
the grid of possible values of a to be used.
If missing, a default sequence is used.
name of the kernel. Possible choices are
"gaussian", "epanechnikov", "triangular".
if mpfr = TRUE, multiple precision floating point is used
via the package Rmpfr.
This allows for a higher (numerical) accuracy, at the expense of computing time.
It is recommended to use this option for higher dimensions.
number of precBits used for floating point precision
(only used if mpfr = TRUE).
a Boolean value.
If dopb = TRUE, a progress bar is displayed.
Alexis Derumigny, Victor Ryan
Ryan, V., & Derumigny, A. (2024). On the choice of the two tuning parameters for nonparametric estimation of an elliptical distribution generator arxiv:2408.17087.
EllDistrEst for the nonparametric estimation of the
elliptical distribution density generator,
EllDistrSim for the simulation of elliptical distribution samples.
estim_tilde_AMSE which is used in this function. It estimates
a component of the asymptotic mean-square error (AMSE) of the nonparametric
estimator of the elliptical density generator assuming \(h\) has been optimally
chosen.
n = 500
d = 3
X = matrix(rnorm(n * d), ncol = d)
grid = seq(0, 5, by = 0.1)
result = EllDistrEst.adapt(X = X, grid = grid, h = 0.05)
plot(grid, result$g, type = "l")
lines(grid, result$first_step_g, col = "blue")
# Computation of true values
g = exp(-grid/2)/(2*pi)^{3/2}
lines(grid, g, type = "l", col = "red")
plot(grid, result$best_a, type = "l", col = "red")
plot(grid, result$best_h, type = "l", col = "red")
sum((g - result$g)^2, na.rm = TRUE) < sum((g - result$first_step_g)^2, na.rm = TRUE)
Run the code above in your browser using DataLab