This function implements the PLFIT algorithm as described by Clauset et al. to determine the value of \(\hat k\). It minimizes the Kolmorogorov-Smirnoff (KS) distance between the empirical cumulative distribution function and the fitted power law.
plfit(data, kmax = -1, kmin = 2, na.rm = FALSE)A named list containing the results of the PLFIT algorithm:
k_hat: The optimal number of top-order statistics \(\hat{k}\).
alpha_hat: The estimated power-law exponent \(\hat{\alpha}\) corresponding to \(\hat{k}\).
xmin_hat: The minimum value \(x_{\min} = X_{(\hat{k})}\) above which the power law is fitted.
ks_distance: The minimum Kolmogorov-Smirnov distance \(D_{n,k}\) found.
A numeric vector of i.i.d. observations.
Maximum number of top-order statistics. If kmax = -1, then kmax=(n-1) where n is the length of dataset
Minimum number of top-order statistics to start with
Logical. If TRUE, missing values (NA) are removed
before analysis. Defaults to FALSE.
$$D_{n,k} := \sup_{y \ge 1} |\frac{1}{k-1} \sum_{i=1}^{k-1} I (\frac{X_{(i)}}{X_{(k)}} > y) - y^{-\hat{\alpha}_{n,k}^H}|$$
The above equation, as described by Nair et al., is implemented in this function with an Empirical CDF instead of the empirical survival function, which is mathematical equivalent since they are both complements of each other.
$$D_{n,k} := \sup_{y \ge 1} | \underbrace{ \frac{1}{k-1} \sum_{i=1}^{k-1} I(\frac{X_{(i)}}{X_{(k)}} \le y) }_{\text{Empirical CDF}} - \underbrace{ (1 - y^{-\hat{\alpha}_{n,k}}) }_{\text{Theoretical CDF}}|$$
$$\hat k = \text{argmin} (D_{n,k})$$
Clauset, A., Shalizi, C. R., & Newman, M. E. (2009). Power-law distributions in empirical data. SIAM Review, 51(4), 661-703. tools:::Rd_expr_doi("10.1137/070710111")
Nair, J., Wierman, A., & Zwart, B. (2022). The Fundamentals of Heavy Tails: Properties, Emergence, and Estimation. Cambridge University Press. (pp. 227-229) tools:::Rd_expr_doi("10.1017/9781009053730")
xmin <- 1
alpha <- 2
r <- runif(800, 0, 1)
x <- (xmin * r^(-1/(alpha)))
plfit_values <- plfit(data = x, kmax = -1, kmin = 2)
Run the code above in your browser using DataLab