estim_tilde_AMSE: Estimate the part of the AMSE of the elliptical density generator that only depends on the parameter "a" assuming $h$ has been optimally chosen

Description

A continuous elliptical distribution has a density of the form $$f_X(x) = {|\Sigma|}^{-1/2} g\left( (x-\mu)^\top \, \Sigma^{-1} \, (x-\mu) \right), $$ where $x \in \mathbb{R}^d$, $\mu \in \mathbb{R}^d$ is the mean, $\Sigma$ is a $d \times d$ positive-definite matrix and a function $g: \mathbb{R}_+ \rightarrow \mathbb{R}_+$, called the density generator of $X$. The goal is to estimate $g$ at some point $\xi$, by $$ \widehat{g}_{n,h,a}(\xi) := \dfrac{\xi^{\frac{-d+2}{2}} \psi_a'(\xi)}{n h s_d} \sum_{i=1}^n K\left( \dfrac{ \psi_a(\xi) - \psi_a(\xi_i) }{h} \right) + K\left( \dfrac{ \psi_a(\xi) + \psi_a(\xi_i) }{h} \right), $$ where $s_d := \pi^{d/2} / \Gamma(d/2)$, $\Gamma$ is the Gamma function, $h$ and $a$ are tuning parameters (respectively the bandwidth and a parameter controlling the bias at $\xi = 0$), $\psi_a(\xi) := -a + (a^{d/2} + \xi^{d/2})^{2/d},$ $\xi \in \mathbb{R}$, $K$ is a kernel function and $\xi_i := (X_i - \mu)^\top \, \Sigma^{-1} \, (X_i - \mu), $ for a sample $X_1, \dots, X_n$. Thanks to Proposition 2.2 in (Ryan and Derumigny, 2024), the asymptotic mean square error of $\widehat{g}_{n,h,a}(\xi)$ can be decomposed into a product of a constant (that depends on the true $g$) and a term that depends on $g$ and $a$. This function computes this term. It can be useful to find out the best value of the parameter $a$ to be used.

Usage

estim_tilde_AMSE(
  X,
  mu = 0,
  Sigma_m1 = diag(NCOL(X)),
  grid,
  h,
  Kernel = "gaussian",
  a = 1,
  mpfr = FALSE,
  precBits = 100,
  dopb = TRUE
)

Value

a vector of the same size as the grid, with the corresponding value for the $\widetilde{AMSE}$.

Arguments

X: a matrix of size $n \times d$, assumed to be $n$ i.i.d. observations (rows) of a $d$-dimensional elliptical distribution.
mu: mean of X. This can be the true value or an estimate. It must be a vector of dimension $d$.
Sigma_m1: inverse of the covariance matrix of X. This can be the true value or an estimate. It must be a matrix of dimension $d \times d$.
grid: grid of values of $\xi$ at which we want to estimate the density generator.
h: bandwidth of the kernel. Can be either a number or a vector of the size length(grid).
Kernel: name of the kernel. Possible choices are "gaussian", "epanechnikov", "triangular".
a: tuning parameter to improve the performance at 0. Can be either a number or a vector of the size length(grid). If this is a vector, the code will need to allocate a matrix of size nrow(X) * length(grid) which can be prohibitive in some cases.
mpfr: if mpfr = TRUE, multiple precision floating point is used via the package Rmpfr. This allows for a higher (numerical) accuracy, at the expense of computing time. It is recommended to use this option for higher dimensions.
precBits: number of precBits used for floating point precision (only used if mpfr = TRUE).
dopb: a Boolean value. If dopb = TRUE, a progress bar is displayed.

Author

Alexis Derumigny, Victor Ryan

References

Ryan, V., & Derumigny, A. (2024). On the choice of the two tuning parameters for nonparametric estimation of an elliptical distribution generator arxiv:2408.17087.

Examples

Run this code

# Comparison between the estimated and true generator of the Gaussian distribution
n = 50000
d = 3
X = matrix(rnorm(n * d), ncol = d)
grid = seq(0, 5, by = 0.1)
a = 1.5

AMSE_est = estim_tilde_AMSE(X = X, grid = grid, a = a, h = 0.09)
plot(grid, abs(AMSE_est), type = "l")

# Computation of true values
g = exp(-grid/2)/(2*pi)^{3/2}
gprime = (-1/2) *exp(-grid/2)/(2*pi)^{3/2}
A = a^(d/2)
psia = -a + (A + grid^(d/2))^(2/d)
psiaprime = grid^(d/2 - 1) * (A + grid^(d/2))^(2/d - 1)
psiasecond = psiaprime * ( (d-2)/2 ) * grid^{-1} * A *
  ( grid^(d/2) + A )^(-1)

rhoprimexi = ((d-2) * grid^((d-4)/2) * psiaprime
  - 2 * grid^((d-2)/2) * psiasecond) / (2 * psiaprime^3) * g +
  grid^((d-2)/2) / (psiaprime^2) * gprime

AMSE = rhoprimexi / psiaprime

lines(grid, abs(AMSE), col = "red")


# Comparison as a function of $a$
n = 50000
d = 3
X = matrix(rnorm(n * d), ncol = d)
grid = 0.1
vec_a = c(0.001, 0.002, 0.005,
0.01, 0.02, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.8, 1, 1.5, 2)

AMSE_est = rep(NA, length = length(vec_a))
for (i in 1:length(vec_a)){
  AMSE_est[i] = estim_tilde_AMSE(X = X, grid = grid, a = vec_a[i], h = 0.09,
                          dopb = FALSE)
}

plot(vec_a, abs(AMSE_est), type = "l", log = "x")

# Computation of true values
a = vec_a

g = exp(-grid/2)/(2*pi)^{3/2}
gprime = (-1/2) *exp(-grid/2)/(2*pi)^{3/2}
A = a^(d/2)
psia = -a + (A + grid^(d/2))^(2/d)
psiaprime = grid^(d/2 - 1) * (A + grid^(d/2))^(2/d - 1)
psiasecond = psiaprime * ( (d-2)/2 ) * grid^{-1} * A *
  ( grid^(d/2) + A )^(-1)

rhoprimexi = ((d-2) * grid^((d-4)/2) * psiaprime
  - 2 * grid^((d-2)/2) * psiasecond) / (2 * psiaprime^3) * g +
  grid^((d-2)/2) / (psiaprime^2) * gprime

AMSE = rhoprimexi / psiaprime

yliminf = min(c(abs(AMSE_est), abs(AMSE)))
ylimsup = max(c(abs(AMSE_est), abs(AMSE)))

plot(vec_a, abs(AMSE_est), type = "l", log = "xy",
     ylim = c(yliminf, ylimsup))
lines(vec_a, abs(AMSE), col = "red")

Run the code above in your browser using DataLab