Navae_ci_ols: Compute NAVAE CI for coefficients of a linear regression based on the OLS estimator and Berry-Esseen (BE) or Edgeworth Expansions (EE) bounds

Description

Compute NAVAE CI for coefficients of a linear regression based on the OLS estimator and Berry-Esseen (BE) or Edgeworth Expansions (EE) bounds

Usage

Navae_ci_ols(
  Y,
  X,
  alpha = 0.05,
  a = NULL,
  omega = NULL,
  bounds = list(lambda_reg = NULL, K_reg = NULL, K_eps = NULL, K_xi = NULL, C = NULL, B =
    NULL),
  K_xi = NULL,
  param_BE_EE = list(choice = "best", setup = list(continuity = FALSE, iid = TRUE,
    no_skewness = FALSE), regularity = list(C0 = 1, p = 2), eps = 0.1),
  intercept = TRUE,
  options = list(center = FALSE, bounded_case = FALSE, with_Exp_regime = FALSE),
  matrix_u = NULL,
  verbose = 0
)

Value

Navae_ci_ols returns an object of class NAVAE_CI_OLS, containing

ci_navae: the NAVAE confidence interval
ci_asymp: the classical "asymptotic" CI based on CLT (as a comparison)
allTuningParameters, allBounds: information concerning the tuning parameters and the bounds used (numerical value and origin)
about_delta_n, delta_n_from: respectively the numerical value of the bound \(delta_n\) used, and a character string BE or EE indicating which type of inequality was used.
minimal_alpha_to_exit_R_regime: the minimal alpha to exit the \(\mathbb{R}\) regime.
bound_K_value, bound_K_method: the value K used and the method to compute it.

Arguments

Y

vector of observations of the explained variables

X, intercept

X is the matrix of explanatory variables. If intercept = TRUE, a constant column of 1 (intercept) is added too. Note that the number of rows of X must be the same as the length of Y.

alpha

this is 1 minus the confidence level of the CI; in other words, the nominal level is 1 - alpha. By default, alpha is set to 0.05, yielding a 95% CI.

a

the free parameter \(a\) (or \(a_n\)) of the interval. It must be either

a numeric value larger than 1, taken as the value of \(a\),
the character value "best" which is the default. It selects the a such that the confidence interval has the smallest length.
a list such as list(power_of_n_for_b = -2/5) giving a way to compute a as a = 1 + n^power_of_n_for_b. Note that -2/5 is the optimal (theoretical) rate.
NULL, interpreted as the default value a = 1 + 100 * n^(-2/5).

omega

the free parameter \(omega\) (or \(omega_n\)) of the interval. It must be either

a numeric value larger than 1, taken as the value of \(omega\),
the character value "best" which is the default. It selects the omega such that the confidence interval has the smallest length.
a list such as list(power_of_n_for_omega = -1/5) giving a way to compute omega as omega = n^power_of_n_for_omega. Note that -1/5 is the optimal (theoretical) rate.
NULL, interpreted as the default value omega = n^(-1/5).

bounds, K_xi

list of bounds for the DGP. Note that K_xi can also be provided as a separate argument, for convenience. It can contain the following items:

lambda_reg
K_eps
K_xi
K3_xi
lambda3_xi
K3tilde_xi
B, C Bounds for the concentration of || Xi tilde
K_reg Bound on \( E[ || vec( \widetilde{X}\widetilde{X}'- \mathbb{I}_p ) ||^2 ] \) Defined in Assumption 3.2 (ii).

The bounds that are not given are replaced by plug-ins. For K3_xi, lambda3_xi and K3tilde_xi, the bounds are obtained from K_xi (= K4_xi).

param_BE_EE

parameters to compute the BE or EE bound \(\delta_n\) used to construct the confidence interval. Otherwise, param_BE_EE is a list of four objects:

choice:
- If equal to "EE", the bound used is Derumigny et al. (2023)'s bound computed using the parameters specified by the rest of param_BE_EE, as described in the arguments of the function BoundEdgeworth::Bound_EE1. Together, these last three items of the list specify the bounds and assumptions used to compute the bound \(\delta_n\) from Derumigny et al. (2023).
- If equal to "BE", then the bound used is the best up-to-date BE bound from Shevtsova (2013) combined with a convexity inequality.
- If equal to "best", both bounds are computed and the smallest of both is used.
  
  By default, following Remark 3.3 of the article, "best" is used and Derumigny et al. (2023)'s bound is computed assuming i.i.d data and no other regularity assumptions (continuous or unskewed distribution). The bound on kurtosis that is used is the one specified in the previous argument K_xi.
setup: itself a logical vector of size 3,
regularity: itself a list of length up to 3,
eps: value between 0 and 1/3,

options

a list of other options (experimental).

matrix_u

each row of this matrix is understood as a new vector u for which a confidence interval should be computed. By default matrix_u is the identity matrix, corresponding to the canonical basis of \(R^p\).

verbose

If verbose = 0, this function is silent and does not print anything. Increasing values of verbose print more details about the progress of the computations and, in particular, the different terms that are computed.

References

For the confidence interval:

Derumigny, A., Girard, L., & Guyonvarch, Y. (2025). Can we have it all? Non-asymptotically valid and asymptotically exact confidence intervals for expectations and linear regressions. ArXiv preprint, tools:::Rd_expr_doi("10.48550/arXiv.2507.16776").

For the underlying Edgeworth expansion bounds:

Derumigny A., Girard L., and Guyonvarch Y. (2023). Explicit non-asymptotic bounds for the distance to the first-order Edgeworth expansion, Sankhya A. tools:::Rd_expr_doi("10.1007/s13171-023-00320-y") ArXiv preprint: tools:::Rd_expr_doi("10.48550/arxiv.2101.05780").

Examples

Run this code

n = 4000
X1 = rnorm(n, sd = 1)
true_eps = rnorm(n)
Y = 2 + 8 * X1 + true_eps

myCI <- Navae_ci_ols(Y, X1, K_xi = 3, a = 1.1)

print(myCI)