This function implements the Double Bootstrap algorithm as described by in Chapter 9 by Nair et al. It applies bootstrapping to two samples of different sizes to choose the value of \(k\) that minimizes the mean square error.
doublebootstrap(
data,
n1 = -1,
n2 = -1,
r = 50,
k_max_prop = 0.5,
kvalues = 20,
na.rm = FALSE
)A named list containing the final results of the Double Bootstrap algorithm:
k: The optimal number of top-order statistics \(\hat{k}\) selected by minimizing the MSE.
alpha: The estimated tail index \(\hat{\alpha}\) (Hill estimator) corresponding to \(\hat{k}\).
A numeric vector of i.i.d. observations.
A numeric scalar specifying the first bootstrap sample size, Nair et al. describe this as \(n_1 = O(n^{1-\epsilon})\) for \(\epsilon \in (0, 1/2)\). Hence, default value (if n1 = -1) is chosen as 0.9.
A numeric scalar specifying the second bootstrap sample size
A numeric scalar specifying the number of bootstraps
A numeric scalar. The max k as a proportion of the sample size. It might be computationally expensive to consider all possible k values from the data. Furthermore, lower k values can be noisy, while higher values can be biased. Hence, k here is limited to a specific proportion (by default 50%) of the data
An integer specifying the length of sequence of candidate k values
Logical. If TRUE, missing values (NA) are removed
before analysis. Defaults to FALSE.
Chapter 9 of Nair et al. specifically describes the Double Bootstrap algorithm for the Hill estimator.
The Hill Double Bootstrap method uses the Hill estimator as the first estimator
$$\hat{\xi}_{n,k}^{(1)} := \frac{1}{k}\sum_{i=1}^{k}\log\left(\frac{X_{(i)}}{X_{(k+1)}}\right)$$
And a second moments-based estimator:
$$\hat{\xi}_{n,k}^{(2)} = \frac{M_{n,k}}{2\hat{\xi}_{n,k}^{H} }$$
Where
$$M_{n,k} := \frac{1}{k}\sum_{i=1}^{k}\left(\log\left(\frac{X_{(i)}}{X_{(k+1)}}\right)\right)^2$$
The difference between these two \(\hat \xi\) is given by:
$$|\hat{\xi}_{n,k}^{(1)} - \hat{\xi}_{n,k}^{(2)}| = \frac{|M_{n,k}-2(\hat{\xi}_{n,k}^{H})^{2}|}{2|\hat{\xi}_{n,k}^{H}|}$$
The Hill bootstrap method selects \(\hat \kappa\) in a way that minimizes the mean square error in the numerator by going through \(r\) bootstrap samples of different sizes \(n_1\) and \(n_2\).
$$\hat{\kappa}_{1}^{*} := \text{arg min}_{k} \frac{1}{r} \sum_{j=1}^{r} (M_{n_1,k}(j) - 2(\hat{\xi}_{n_1,k}^{(1)}(j))^2)^2$$
This process is repeated to determine \(\hat \kappa_{2}\) with the bootstrap sample of size \(n_{2}\). The final \(\hat \kappa\) is given by:
$$\hat{\kappa}^{*} = \frac{(\hat{\kappa}_{1}^{*})^2}{\hat{\kappa}_{2}^{*}} (\frac{\log \hat{\kappa}_{1}^{*}}{2\log n_1 - \log \hat{\kappa}_{1}^{*}})^{\frac{2(\log n_1 - \log \hat{\kappa}_{1}^{*})}{\log n_1}}$$
Danielsson, J., de Haan, L., Peng, L., & de Vries, C. G. (2001). Using a bootstrap method to choose the sample fraction in tail index estimation. Journal of Multivariate Analysis, 76(2), 226–248. tools:::Rd_expr_doi("10.1006/jmva.2000.1903")
Nair, J., Wierman, A., & Zwart, B. (2022). The Fundamentals of Heavy Tails: Properties, Emergence, and Estimation. Cambridge University Press. (pp. 229-233) tools:::Rd_expr_doi("10.1017/9781009053730")
xmin <- 1
alpha <- 2
r <- runif(800, 0, 1)
x <- (xmin * r^(-1/(alpha)))
db_kalpha <- doublebootstrap(data = x, n1 = -1, n2 = -1, r = 5, k_max_prop = 0.5, kvalues = 20)
Run the code above in your browser using DataLab