Learn R Programming

evmix (version 1.0)

gkg: Kernel Density Estimation for Bulk and GPD for Both Upper and Lower Tails in Extreme Value Mixture Model

Description

Density, cumulative distribution function, quantile function and random number generation for the extreme value mixture model with kernel density estimation using normal kernel for bulk distribution between the upper and lower thresholds with conditional GPD's for the two tails. The parameters are the kernel bandwidth lambda, lower tail (threshold ul, GPD scale sigmaul and shape xil and tail fraction phiul) and upper tail (threshold ur, GPD scale sigmaur and shape xiR and tail fraction phiur).

Usage

dgkg(x, kerncentres, lambda = NULL,
    ul = as.vector(quantile(kerncentres, 0.1)),
    sigmaul = sqrt(6 * var(kerncentres))/pi, xil = 0,
    phiul = TRUE,
    ur = as.vector(quantile(kerncentres, 0.9)),
    sigmaur = sqrt(6 * var(kerncentres))/pi, xir = 0,
    phiur = TRUE, log = FALSE)

  pgkg(q, kerncentres, lambda = NULL,
    ul = as.vector(quantile(kerncentres, 0.1)),
    sigmaul = sqrt(6 * var(kerncentres))/pi, xil = 0,
    phiul = TRUE,
    ur = as.vector(quantile(kerncentres, 0.9)),
    sigmaur = sqrt(6 * var(kerncentres))/pi, xir = 0,
    phiur = TRUE, lower.tail = TRUE)

  qgkg(p, kerncentres, lambda = NULL,
    ul = as.vector(quantile(kerncentres, 0.1)),
    sigmaul = sqrt(6 * var(kerncentres))/pi, xil = 0,
    phiul = TRUE,
    ur = as.vector(quantile(kerncentres, 0.9)),
    sigmaur = sqrt(6 * var(kerncentres))/pi, xir = 0,
    phiur = TRUE, lower.tail = TRUE)

  rgkg(n = 1, kerncentres, lambda = NULL,
    ul = as.vector(quantile(kerncentres, 0.1)),
    sigmaul = sqrt(6 * var(kerncentres))/pi, xil = 0,
    phiul = TRUE,
    ur = as.vector(quantile(kerncentres, 0.9)),
    sigmaur = sqrt(6 * var(kerncentres))/pi, xir = 0,
    phiur = TRUE)

Arguments

x
quantile
kerncentres
kernel centres (typically sample data)
lambda
bandwidth for normal kernel (standard deviation of normal)
log
logical, if TRUE then log density
q
quantile
lower.tail
logical, if FALSE then upper tail probabilities
p
cumulative probability
n
sample size (non-negative integer)
ul
lower tail threshold
sigmaul
lower tail GPD scale parameter (non-negative)
xil
lower tail GPD shape parameter
phiul
probability of being below lower threshold (0,1)
ur
upper tail threshold
sigmaur
upper tail GPD scale parameter (non-negative)
xir
upper tail GPD shape parameter
phiur
probability of being above upper threshold (0,1)

Value

  • dgkg gives the density, pgkg gives the cumulative distribution function, qgkg gives the quantile function and rgkg gives a random sample.

Details

Extreme value mixture model combining kernel density estimator (KDE) with normal kernels to represent the bulk between the lower and upper thresholds and GPD for the upper and lower tails. The user can pre-specify phiul and phiur permitting a parameterised value for the lower and upper tail fraction respectively. Alternatively, when phiul=TRUE or phiur=TRUE the corresponding tail fraction is estimated as from the normal bulk model. Notice that the tail fraction cannot be 0 or 1, and the sum of upper and lower tail fractions phiul + phiur < 1, so the lower threshold must be less than the upper, ul < ur. The cumulative distribution function has three components. The lower tail with tail fraction $\phi_{ul}$ defined by the KDE bulk model (phiul=TRUE) upto the lower threshold $x < u_l$: $$F(x) = H(u_l) [1 - G_l(x)].$$ where $H(x)$ is the kernel density estimator cumulative distribution function (i.e. mean(pnorm(x, kerncentres, lambda)) and $G_l(X)$ is the conditional GPD cumulative distribution function with negated $x$ value and threshold, i.e. pgpd(-x, -ul, sigmaul, xil, phiul). The KDE bulk model between the thresholds $u_l \le x \le u_r$ given by: $$F(x) = H(x).$$ Above the threshold $x > u_r$ the usual conditional GPD: $$F(x) = H(u_r) + [1 - H(u_r)] G_r(x)$$ where $G_r(X)$ is the GPD cumulative distribution function, i.e. pgpd(x, ur, sigmaur, xir, phiur). The cumulative distribution function for the pre-specified tail fractions $\phi_{ul}$ and $\phi_{ur}$ is more complicated. The unconditional GPD is used for the lower tail $x < u_l$: $$F(x) = \phi_{ul} [1 - G_l(x)].$$ The KDE bulk model between the thresholds $u_l \le x \le u_r$ given by: $$F(x) = \phi_{ul}+ (1-\phi_{ul}-\phi_{ur}) (H(x) - H(u_l)) / (H(u_r) - H(u_l)).$$ Above the threshold $x > u_r$ the usual conditional GPD: $$F(x) = (1-\phi_{ur}) + \phi_{ur} G(x)$$ Notice that these definitions are equivalent when $\phi_{ul} = H(u_l)$ and $\phi_{ur} = 1 - H(u_r)$. See gpd for details of GPD upper tail component, dkden for details of KDE bulk component and dkdengpd for KDE with sinlge upper tail GPD extreme value mixture model.

References

http://en.wikipedia.org/wiki/Normal_distribution http://en.wikipedia.org/wiki/Generalized_Pareto_distribution Scarrott, C.J. and MacDonald, A. (2012). A review of extreme value threshold estimation and uncertainty quantification. REVSTAT - Statistical Journal 10(1), 33-59. Available from http://www.ine.pt/revstat/pdf/rs120102.pdf

See Also

Other gkg: fgkg, lgkg, nlgkg

Examples

Run this code
par(mfrow=c(2,2))
kerncentres=rnorm(1000,0,1)
x = rgkg(1000, kerncentres, phiul = 0.15, phiur = 0.15)
xx = seq(-6, 6, 0.01)
hist(x, breaks = 100, freq = FALSE, xlim = c(-6, 6))
lines(xx, dgkg(xx, kerncentres, phiul = 0.15, phiur = 0.15))

# three tail behaviours
plot(xx, pgkg(xx, kerncentres), type = "l")
lines(xx, pgkg(xx, kerncentres,xil = 0.3, xir = 0.3), col = "red")
lines(xx, pgkg(xx, kerncentres,xil = -0.3, xir = -0.3), col = "blue")
legend("topleft", paste("Symmetric xil=xir=",c(0, 0.3, -0.3)),
  col=c("black", "red", "blue"), lty = 1)

x = rgkg(1000, kerncentres, xil = -0.3, phiul = 0.2, xir = 0.3, phiur = 0.2)
xx = seq(-6, 6, 0.01)
hist(x, breaks = 100, freq = FALSE, xlim = c(-6, 6))
lines(xx, dgkg(xx, kerncentres, xil = -0.3, phiul = 0.2, xir = 0.3, phiur = 0.2))

plot(xx, dgkg(xx, kerncentres, xil = -0.3, phiul = 0.2, xir = 0.3, phiur = 0.2),
  type = "l", ylim = c(0, 0.4))
lines(xx, dgkg(xx, kerncentres, xil = -0.3, phiul = 0.3, xir = 0.3, phiur = 0.3),
  col = "red")
lines(xx, dgkg(xx, kerncentres, xil = -0.3, phiul = TRUE, xir = 0.3, phiur = TRUE),
  col = "blue")
legend("topleft", c("phiul = phiur = 0.2", "phiul = phiur = 0.3", "Bulk Tail Fraction"),
  col=c("black", "red", "blue"), lty = 1)

Run the code above in your browser using DataLab