Compute NAVAE CI for coefficients of a linear regression based on the OLS estimator and Berry-Esseen (BE) or Edgeworth Expansions (EE) bounds
Navae_ci_ols(
Y,
X,
alpha = 0.05,
a = NULL,
omega = NULL,
bounds = list(lambda_reg = NULL, K_reg = NULL, K_eps = NULL, K_xi = NULL, C = NULL, B =
NULL),
K_xi = NULL,
param_BE_EE = list(choice = "best", setup = list(continuity = FALSE, iid = TRUE,
no_skewness = FALSE), regularity = list(C0 = 1, p = 2), eps = 0.1),
intercept = TRUE,
options = list(center = FALSE, bounded_case = FALSE, with_Exp_regime = FALSE),
matrix_u = NULL,
verbose = 0
)Navae_ci_ols returns an object of class NAVAE_CI_OLS,
containing
ci_navae: the NAVAE confidence interval
ci_asymp: the classical "asymptotic" CI based on CLT
(as a comparison)
allTuningParameters, allBounds: information concerning
the tuning parameters and the bounds used (numerical value and origin)
about_delta_n, delta_n_from: respectively the numerical value
of the bound \(delta_n\) used, and a character string BE or EE
indicating which type of inequality was used.
minimal_alpha_to_exit_R_regime: the minimal alpha to exit the
\(\mathbb{R}\) regime.
bound_K_value, bound_K_method: the value K used and the
method to compute it.
vector of observations of the explained variables
X is the matrix of explanatory variables. If
intercept = TRUE, a constant column of 1 (intercept) is added
too. Note that the number of rows of X must be the same as the length
of Y.
this is 1 minus the confidence level of the CI; in other words,
the nominal level is 1 - alpha.
By default, alpha is set to 0.05, yielding a 95% CI.
the free parameter \(a\) (or \(a_n\)) of the interval. It must be either
a numeric value larger than 1, taken as the value of \(a\),
the character value "best" which is the default. It selects the
a such that the confidence interval has the smallest length.
a list such as list(power_of_n_for_b = -2/5) giving a way to
compute a as a = 1 + n^power_of_n_for_b. Note that -2/5
is the optimal (theoretical) rate.
NULL, interpreted as the default value
a = 1 + 100 * n^(-2/5).
the free parameter \(omega\) (or \(omega_n\)) of the interval. It must be either
a numeric value larger than 1, taken as the value of \(omega\),
the character value "best" which is the default. It selects the
omega such that the confidence interval has the smallest length.
a list such as list(power_of_n_for_omega = -1/5) giving a way to
compute omega as omega = n^power_of_n_for_omega.
Note that -1/5 is the optimal (theoretical) rate.
NULL, interpreted as the default value
omega = n^(-1/5).
list of bounds for the DGP. Note that K_xi can also
be provided as a separate argument, for convenience.
It can contain the following items:
lambda_reg
K_eps
K_xi
K3_xi
lambda3_xi
K3tilde_xi
B, C Bounds for the concentration of || Xi tilde
K_reg Bound on
\( E[ || vec( \widetilde{X}\widetilde{X}'- \mathbb{I}_p ) ||^2 ] \)
Defined in Assumption 3.2 (ii).
The bounds that are not given are replaced by plug-ins. For K3_xi, lambda3_xi and K3tilde_xi, the bounds are obtained from K_xi (= K4_xi).
parameters to compute the BE or EE bound \(\delta_n\) used
to construct the confidence interval.
Otherwise, param_BE_EE is a list of four objects:
choice:
If equal to "EE", the bound used is Derumigny et al. (2023)'s
bound computed using the parameters specified by the rest of param_BE_EE,
as described in the arguments of the function
BoundEdgeworth::Bound_EE1.
Together, these last three items of the list specify the bounds and
assumptions used to compute the bound \(\delta_n\) from Derumigny et al. (2023).
If equal to "BE", then the bound used is the best up-to-date
BE bound from Shevtsova (2013) combined with a convexity inequality.
If equal to "best", both bounds are computed
and the smallest of both is used.
By default, following Remark 3.3 of the article, "best" is used
and Derumigny et al. (2023)'s bound is computed assuming i.i.d data and
no other regularity assumptions (continuous or unskewed distribution).
The bound on kurtosis that is used is the one specified in the previous
argument K_xi.
setup: itself a logical vector of size 3,
regularity: itself a list of length up to 3,
eps: value between 0 and 1/3,
a list of other options (experimental).
each row of this matrix is understood as a new vector u
for which a confidence interval should be computed.
By default matrix_u is the identity matrix, corresponding
to the canonical basis of \(R^p\).
If verbose = 0, this function is silent and does not
print anything. Increasing values of verbose print more details about
the progress of the computations and, in particular, the different terms that
are computed.
For the confidence interval:
Derumigny, A., Girard, L., & Guyonvarch, Y. (2025). Can we have it all? Non-asymptotically valid and asymptotically exact confidence intervals for expectations and linear regressions. ArXiv preprint, tools:::Rd_expr_doi("10.48550/arXiv.2507.16776").
For the underlying Edgeworth expansion bounds:
Derumigny A., Girard L., and Guyonvarch Y. (2023). Explicit non-asymptotic bounds for the distance to the first-order Edgeworth expansion, Sankhya A. tools:::Rd_expr_doi("10.1007/s13171-023-00320-y") ArXiv preprint: tools:::Rd_expr_doi("10.48550/arxiv.2101.05780").
The methods to display and process the output of this function:
print.NAVAE_CI_OLS and
as.data.frame.NAVAE_CI_OLS.
Navae_ci_mean which is the corresponding function for the
estimation of the mean.
n = 4000
X1 = rnorm(n, sd = 1)
true_eps = rnorm(n)
Y = 2 + 8 * X1 + true_eps
myCI <- Navae_ci_ols(Y, X1, K_xi = 3, a = 1.1)
print(myCI)
Run the code above in your browser using DataLab