doublebootstrap: Double Bootstrap algorithm

Description

This function implements the Double Bootstrap algorithm as described by in Chapter 9 by Nair et al. It applies bootstrapping to two samples of different sizes to choose the value of $k$ that minimizes the mean square error.

Usage

doublebootstrap(
  data,
  n1 = -1,
  n2 = -1,
  r = 50,
  k_max_prop = 0.5,
  kvalues = 20,
  na.rm = FALSE
)

Value

A named list containing the final results of the Double Bootstrap algorithm:

k: The optimal number of top-order statistics $\hat{k}$ selected by minimizing the MSE.
alpha: The estimated tail index $\hat{\alpha}$ (Hill estimator) corresponding to $\hat{k}$.

Arguments

data: A numeric vector of i.i.d. observations.
n1: A numeric scalar specifying the first bootstrap sample size, Nair et al. describe this as $n_1 = O(n^{1-\epsilon})$ for $\epsilon \in (0, 1/2)$. Hence, default value (if n1 = -1) is chosen as 0.9.
n2: A numeric scalar specifying the second bootstrap sample size
r: A numeric scalar specifying the number of bootstraps
k_max_prop: A numeric scalar. The max k as a proportion of the sample size. It might be computationally expensive to consider all possible k values from the data. Furthermore, lower k values can be noisy, while higher values can be biased. Hence, k here is limited to a specific proportion (by default 50%) of the data
kvalues: An integer specifying the length of sequence of candidate k values
na.rm: Logical. If TRUE, missing values (NA) are removed before analysis. Defaults to FALSE.

Details

Chapter 9 of Nair et al. specifically describes the Double Bootstrap algorithm for the Hill estimator.

The Hill Double Bootstrap method uses the Hill estimator as the first estimator

$$\hat{\xi}_{n,k}^{(1)} := \frac{1}{k}\sum_{i=1}^{k}\log\left(\frac{X_{(i)}}{X_{(k+1)}}\right)$$

And a second moments-based estimator:

$$\hat{\xi}_{n,k}^{(2)} = \frac{M_{n,k}}{2\hat{\xi}_{n,k}^{H} }$$

Where

$$M_{n,k} := \frac{1}{k}\sum_{i=1}^{k}\left(\log\left(\frac{X_{(i)}}{X_{(k+1)}}\right)\right)^2$$

The difference between these two $\hat \xi$ is given by:

$$|\hat{\xi}_{n,k}^{(1)} - \hat{\xi}_{n,k}^{(2)}| = \frac{|M_{n,k}-2(\hat{\xi}_{n,k}^{H})^{2}|}{2|\hat{\xi}_{n,k}^{H}|}$$

The Hill bootstrap method selects $\hat \kappa$ in a way that minimizes the mean square error in the numerator by going through $r$ bootstrap samples of different sizes $n_1$ and $n_2$.

$$\hat{\kappa}_{1}^{*} := \text{arg min}_{k} \frac{1}{r} \sum_{j=1}^{r} (M_{n_1,k}(j) - 2(\hat{\xi}_{n_1,k}^{(1)}(j))^2)^2$$

This process is repeated to determine $\hat \kappa_{2}$ with the bootstrap sample of size $n_{2}$. The final $\hat \kappa$ is given by:

$$\hat{\kappa}^{*} = \frac{(\hat{\kappa}_{1}^{*})^2}{\hat{\kappa}_{2}^{*}} (\frac{\log \hat{\kappa}_{1}^{*}}{2\log n_1 - \log \hat{\kappa}_{1}^{*}})^{\frac{2(\log n_1 - \log \hat{\kappa}_{1}^{*})}{\log n_1}}$$

References

Danielsson, J., de Haan, L., Peng, L., & de Vries, C. G. (2001). Using a bootstrap method to choose the sample fraction in tail index estimation. Journal of Multivariate Analysis, 76(2), 226–248. tools:::Rd_expr_doi("10.1006/jmva.2000.1903")

Nair, J., Wierman, A., & Zwart, B. (2022). The Fundamentals of Heavy Tails: Properties, Emergence, and Estimation. Cambridge University Press. (pp. 229-233) tools:::Rd_expr_doi("10.1017/9781009053730")

Examples

Run this code

xmin <- 1
alpha <- 2
r <- runif(800, 0, 1)
x <- (xmin * r^(-1/(alpha)))
db_kalpha <- doublebootstrap(data = x, n1 = -1, n2 = -1, r = 5, k_max_prop = 0.5, kvalues = 20)

Run the code above in your browser using DataLab