Navae_ci_mean: Compute NAVAE CI for the expectation based on empirical mean estimator and Berry-Esseen (BE) or Edgeworth Expansions (EE) bounds

Description

Compute NAVAE CI for the expectation based on empirical mean estimator and Berry-Esseen (BE) or Edgeworth Expansions (EE) bounds

Usage

Navae_ci_mean(
  data,
  alpha = 0.05,
  a = "best",
  bound_K = NULL,
  known_variance = NULL,
  param_BE_EE = list(choice = "best", setup = list(continuity = FALSE, iid = TRUE,
    no_skewness = FALSE), regularity = list(C0 = 1, p = 2), eps = 0.1),
  na.rm = FALSE
)

Value

Navae_ci_mean returns an object of class NAVAE_CI_Mean, containing:

ci_navae: the NAVAE confidence interval
ci_asymp: the classical "asymptotic" CI based on CLT (as a comparison)
indicator_R_regime: 1 if we are in the \(\mathbb{R}\) regime and 0 else.
delta_n, delta_n_from: respectively the numerical value of the bound \(delta_n\) used, and a character string BE or EE indicating which type of inequality was used.
minimal_alpha_to_exit_R_regime: the minimal alpha to exit the \(\mathbb{R}\) regime.
bound_K_value, bound_K_method: the value K used and the method to compute it.

Arguments

data

vector of univariate observations.

alpha

this is 1 minus the confidence level of the CI; in other words, the nominal level is 1 - alpha. By default, alpha is set to 0.05, yielding a \(95\%\) CI.

a

the free parameter \(a\) (or \(a_n\)) of the interval. It must be either

a numeric value larger than 1, taken as the value of \(a\),
the character value "best" which is the default. It selects the a such that the confidence interval has the smallest length.
a list such as list(power_of_n_for_b = -2/5) giving a way to compute a as a = 1 + n^power_of_n_for_b. Note that -2/5 is the optimal (theoretical) rate.

bound_K

bound on the kurtosis K_4(theta) of the distribution of the observations that are assumed to be i.i.d. The choice of 9 covers most "usual" distributions. If the argument is not provided (default argument NULL), the value used is the plug-in counterpart \(\widehat{K}\), that is, the empirical kurtosis of the observations.

known_variance

by default NULL, in this case, the function computes the CI in the general case with an unknown variance (which is estimated). Otherwise, a scalar numeric vector equal to the (assumed/known) variance. (NB: if the option is used, one must provide the variance and not the standard deviation.)

param_BE_EE

parameters to compute the BE or EE bound \(\delta_n\) used to construct the confidence interval. If param_BE_EE is exactly equal to "BE", then the bound used is the best up-to-date BE bound from Shevtsova (2013) combined with a convexity inequality. Otherwise, param_BE_EE is a list of four objects:

choice: If equal to "EE", the bound used is Derumigny et al. (2023)'s bound computed using the parameters specified by the rest of param_BE_EE, namely
setup: itself a logical vector of size 3,
regularity: itself a list of length up to 3,
eps: value between 0 and 1/3,

as described in the arguments of the function BoundEdgeworth::Bound_EE1. Together, they specify the bounds and assumptions used to compute the bound \(\delta_n\) from Derumigny et al. (2023). Finally, if choice is equal to "best", the bound used is the minimum between the previous one (with choice = "EE") and the bound "BE".

By default, following Remark 3.3 of the article, "best" is used and Derumigny et al. (2025)'s bounds is computed assuming i.i.d data and no other regularity assumptions (continuous or unskewed distribution) and the bound on kurtosis used is the one specified in the previous the argument bound_K.

na.rm

logical, should missing values in data be removed?

References

For the confidence interval:

Derumigny, A., Girard, L., & Guyonvarch, Y. (2025). Can we have it all? Non-asymptotically valid and asymptotically exact confidence intervals for expectations and linear regressions. ArXiv preprint, tools:::Rd_expr_doi("10.48550/arXiv.2507.16776").

For the underlying Edgeworth expansion bounds:

Derumigny A., Girard L., and Guyonvarch Y. (2023). Explicit non-asymptotic bounds for the distance to the first-order Edgeworth expansion, Sankhya A. tools:::Rd_expr_doi("10.1007/s13171-023-00320-y") ArXiv preprint: tools:::Rd_expr_doi("10.48550/arxiv.2101.05780").

Examples

Run this code

n = 10000
x = rexp(n, 1)
Navae_ci_mean(x, bound_K = 9, alpha = 0.2)

Navae_ci_mean(x, bound_K = 9, alpha = 0.2, a = 1 + n^(-2/5))
# Same as:
Navae_ci_mean(x, bound_K = 9, alpha = 0.2, a = list(power_of_n_for_b = -2/5))

# plug-in for K ( = data-driven choice of K)
Navae_ci_mean(x, alpha = 0.2)

listParams1 = list(
  choice = "best",
  setup = list(continuity = FALSE, iid = TRUE, no_skewness = FALSE),
  regularity = list(C0 = 1, p = 2),
  eps = 0.1)

listParams2 = list(
  choice = "best",
  setup = list(continuity = TRUE, iid = TRUE, no_skewness = FALSE),
  regularity = list(kappa = 0.99), eps = 0.1)

Navae_ci_mean(x, alpha = 0.1, param_BE_EE = listParams1)
Navae_ci_mean(x, alpha = 0.1, param_BE_EE = listParams2)
Navae_ci_mean(x, alpha = 0.05, param_BE_EE = listParams1)
Navae_ci_mean(x, alpha = 0.05, param_BE_EE = listParams2)