Learn R Programming

ElliptCopulas (version 0.1.4.1)

estim_tilde_AMSE: Estimate the part of the AMSE of the elliptical density generator that only depends on the parameter "a" assuming \(h\) has been optimally chosen

Description

A continuous elliptical distribution has a density of the form $$f_X(x) = {|\Sigma|}^{-1/2} g\left( (x-\mu)^\top \, \Sigma^{-1} \, (x-\mu) \right), $$ where \(x \in \mathbb{R}^d\), \(\mu \in \mathbb{R}^d\) is the mean, \(\Sigma\) is a \(d \times d\) positive-definite matrix and a function \(g: \mathbb{R}_+ \rightarrow \mathbb{R}_+\), called the density generator of \(X\). The goal is to estimate \(g\) at some point \(\xi\), by $$ \widehat{g}_{n,h,a}(\xi) := \dfrac{\xi^{\frac{-d+2}{2}} \psi_a'(\xi)}{n h s_d} \sum_{i=1}^n K\left( \dfrac{ \psi_a(\xi) - \psi_a(\xi_i) }{h} \right) + K\left( \dfrac{ \psi_a(\xi) + \psi_a(\xi_i) }{h} \right), $$ where \(s_d := \pi^{d/2} / \Gamma(d/2)\), \(\Gamma\) is the Gamma function, \(h\) and \(a\) are tuning parameters (respectively the bandwidth and a parameter controlling the bias at \(\xi = 0\)), \(\psi_a(\xi) := -a + (a^{d/2} + \xi^{d/2})^{2/d},\) \(\xi \in \mathbb{R}\), \(K\) is a kernel function and \(\xi_i := (X_i - \mu)^\top \, \Sigma^{-1} \, (X_i - \mu), \) for a sample \(X_1, \dots, X_n\). Thanks to Proposition 2.2 in (Ryan and Derumigny, 2024), the asymptotic mean square error of \(\widehat{g}_{n,h,a}(\xi)\) can be decomposed into a product of a constant (that depends on the true \(g\)) and a term that depends on \(g\) and \(a\). This function computes this term. It can be useful to find out the best value of the parameter \(a\) to be used.

Usage

estim_tilde_AMSE(
  X,
  mu = 0,
  Sigma_m1 = diag(NCOL(X)),
  grid,
  h,
  Kernel = "gaussian",
  a = 1,
  mpfr = FALSE,
  precBits = 100,
  dopb = TRUE
)

Value

a vector of the same size as the grid, with the corresponding value for the \(\widetilde{AMSE}\).

Arguments

X

a matrix of size \(n \times d\), assumed to be \(n\) i.i.d. observations (rows) of a \(d\)-dimensional elliptical distribution.

mu

mean of X. This can be the true value or an estimate. It must be a vector of dimension \(d\).

Sigma_m1

inverse of the covariance matrix of X. This can be the true value or an estimate. It must be a matrix of dimension \(d \times d\).

grid

grid of values of \(\xi\) at which we want to estimate the density generator.

h

bandwidth of the kernel. Can be either a number or a vector of the size length(grid).

Kernel

name of the kernel. Possible choices are "gaussian", "epanechnikov", "triangular".

a

tuning parameter to improve the performance at 0. Can be either a number or a vector of the size length(grid). If this is a vector, the code will need to allocate a matrix of size nrow(X) * length(grid) which can be prohibitive in some cases.

mpfr

if mpfr = TRUE, multiple precision floating point is used via the package Rmpfr. This allows for a higher (numerical) accuracy, at the expense of computing time. It is recommended to use this option for higher dimensions.

precBits

number of precBits used for floating point precision (only used if mpfr = TRUE).

dopb

a Boolean value. If dopb = TRUE, a progress bar is displayed.

Author

Alexis Derumigny, Victor Ryan

References

Ryan, V., & Derumigny, A. (2024). On the choice of the two tuning parameters for nonparametric estimation of an elliptical distribution generator arxiv:2408.17087.

Examples

Run this code
# Comparison between the estimated and true generator of the Gaussian distribution
n = 50000
d = 3
X = matrix(rnorm(n * d), ncol = d)
grid = seq(0, 5, by = 0.1)
a = 1.5

AMSE_est = estim_tilde_AMSE(X = X, grid = grid, a = a, h = 0.09)
plot(grid, abs(AMSE_est), type = "l")

# Computation of true values
g = exp(-grid/2)/(2*pi)^{3/2}
gprime = (-1/2) *exp(-grid/2)/(2*pi)^{3/2}
A = a^(d/2)
psia = -a + (A + grid^(d/2))^(2/d)
psiaprime = grid^(d/2 - 1) * (A + grid^(d/2))^(2/d - 1)
psiasecond = psiaprime * ( (d-2)/2 ) * grid^{-1} * A *
  ( grid^(d/2) + A )^(-1)

rhoprimexi = ((d-2) * grid^((d-4)/2) * psiaprime
  - 2 * grid^((d-2)/2) * psiasecond) / (2 * psiaprime^3) * g +
  grid^((d-2)/2) / (psiaprime^2) * gprime

AMSE = rhoprimexi / psiaprime

lines(grid, abs(AMSE), col = "red")


# Comparison as a function of $a$
n = 50000
d = 3
X = matrix(rnorm(n * d), ncol = d)
grid = 0.1
vec_a = c(0.001, 0.002, 0.005,
0.01, 0.02, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.8, 1, 1.5, 2)

AMSE_est = rep(NA, length = length(vec_a))
for (i in 1:length(vec_a)){
  AMSE_est[i] = estim_tilde_AMSE(X = X, grid = grid, a = vec_a[i], h = 0.09,
                          dopb = FALSE)
}

plot(vec_a, abs(AMSE_est), type = "l", log = "x")

# Computation of true values
a = vec_a

g = exp(-grid/2)/(2*pi)^{3/2}
gprime = (-1/2) *exp(-grid/2)/(2*pi)^{3/2}
A = a^(d/2)
psia = -a + (A + grid^(d/2))^(2/d)
psiaprime = grid^(d/2 - 1) * (A + grid^(d/2))^(2/d - 1)
psiasecond = psiaprime * ( (d-2)/2 ) * grid^{-1} * A *
  ( grid^(d/2) + A )^(-1)

rhoprimexi = ((d-2) * grid^((d-4)/2) * psiaprime
  - 2 * grid^((d-2)/2) * psiasecond) / (2 * psiaprime^3) * g +
  grid^((d-2)/2) / (psiaprime^2) * gprime

AMSE = rhoprimexi / psiaprime

yliminf = min(c(abs(AMSE_est), abs(AMSE)))
ylimsup = max(c(abs(AMSE_est), abs(AMSE)))

plot(vec_a, abs(AMSE_est), type = "l", log = "xy",
     ylim = c(yliminf, ylimsup))
lines(vec_a, abs(AMSE), col = "red")

Run the code above in your browser using DataLab