EllDistrEst.adapt: Estimation of the generator of the elliptical distribution by kernel smoothing with adaptive choice of the bandwidth

Description

A continuous elliptical distribution has a density of the form $$f_X(x) = {|\Sigma|}^{-1/2} g\left( (x-\mu)^\top \, \Sigma^{-1} \, (x-\mu) \right), $$ where $x \in \mathbb{R}^d$, $\mu \in \mathbb{R}^d$ is the mean, $\Sigma$ is a $d \times d$ positive-definite matrix and a function $g: \mathbb{R}_+ \rightarrow \mathbb{R}_+$, called the density generator of $X$. The goal is to estimate $g$ at some point $\xi$, by $$ \widehat{g}_{n,h,a}(\xi) := \dfrac{\xi^{\frac{-d+2}{2}} \psi_a'(\xi)}{n h s_d} \sum_{i=1}^n K\left( \dfrac{ \psi_a(\xi) - \psi_a(\xi_i) }{h} \right) + K\left( \dfrac{ \psi_a(\xi) + \psi_a(\xi_i) }{h} \right), $$ where $s_d := \pi^{d/2} / \Gamma(d/2)$, $\Gamma$ is the Gamma function, $h$ and $a$ are tuning parameters (respectively the bandwidth and a parameter controlling the bias at $\xi = 0$), $\psi_a(\xi) := -a + (a^{d/2} + \xi^{d/2})^{2/d},$ $\xi \in \mathbb{R}$, $K$ is a kernel function and $\xi_i := (X_i - \mu)^\top \, \Sigma^{-1} \, (X_i - \mu), $ for a sample $X_1, \dots, X_n$. This function computes "optimal asymptotic" values for the bandwidth $h$ and the tuning parameter $a$ from a first step bandwidth that the user needs to provide.

Usage

EllDistrEst.adapt(
  X,
  mu = 0,
  Sigma_m1 = diag(NCOL(X)),
  grid,
  h_firstStep,
  grid_a = NULL,
  Kernel = "gaussian",
  mpfr = FALSE,
  precBits = 100,
  dopb = TRUE
)

Value

a list with the following elements:

g a vector of size n1 = length(grid). Each component of this vector is an estimator of $g(x[i])$ where x[i] is the $i$-th element of the grid.
best_a a vector of the same size as grid indicating for each value of the grid what is the optimal choice of $a$ found by our algorithm (which is used to estimate $g$).
best_h a vector of the same size as grid indicating for each value of the grid what is the optimal choice of $h$ found by our algorithm (which is used to estimate $g$).
first_step_g first step estimator of g, computed using the tuning parameters best_a and h_firstStep[2].
AMSE_estimated an estimator of the part of the asymptotic MSE that only depends on $a$.

Arguments

X

a matrix of size $n \times d$, assumed to be $n$ i.i.d. observations (rows) of a $d$-dimensional elliptical distribution.

mu

mean of X. This can be the true value or an estimate. It must be a vector of dimension $d$.

Sigma_m1

inverse of the covariance matrix of X. This can be the true value or an estimate. It must be a matrix of dimension $d \times d$.

grid

vector containing the values at which we want the generator to be estimated.

h_firstStep

a vector of size 2 containing first-step bandwidths to be used. The first one is used for the estimation of the asymptotic mean-squared error. The second one is used for the first step estimation of $g$. From these two estimators, a final value of the bandwidth $h$ is determined, which is used for the final estimator of $g$.

If h_firstStep is of length 1, its value is reused for both purposes (estimation of the AMSE and first-step estimation of $g$).

grid_a

the grid of possible values of a to be used. If missing, a default sequence is used.

Kernel

name of the kernel. Possible choices are "gaussian", "epanechnikov", "triangular".

mpfr

if mpfr = TRUE, multiple precision floating point is used via the package Rmpfr. This allows for a higher (numerical) accuracy, at the expense of computing time. It is recommended to use this option for higher dimensions.

precBits

number of precBits used for floating point precision (only used if mpfr = TRUE).

dopb

a Boolean value. If dopb = TRUE, a progress bar is displayed.

Author

Alexis Derumigny, Victor Ryan

References

Ryan, V., & Derumigny, A. (2024). On the choice of the two tuning parameters for nonparametric estimation of an elliptical distribution generator arxiv:2408.17087.

Examples

Run this code

n = 500
d = 3
X = matrix(rnorm(n * d), ncol = d)
grid = seq(0, 5, by = 0.1)

result = EllDistrEst.adapt(X = X, grid = grid, h = 0.05)
plot(grid, result$g, type = "l")
lines(grid, result$first_step_g, col = "blue")

# Computation of true values
g = exp(-grid/2)/(2*pi)^{3/2}
lines(grid, g, type = "l", col = "red")

plot(grid, result$best_a, type = "l", col = "red")
plot(grid, result$best_h, type = "l", col = "red")

sum((g - result$g)^2, na.rm = TRUE) < sum((g - result$first_step_g)^2, na.rm = TRUE)

Run the code above in your browser using DataLab