kdengpdcon: Kernel Density Estimation Using Normal Kernel and GPD Tail Extreme Value Mixture Model with Single Continuity Constraint

Description

Density, cumulative distribution function, quantile function and random number generation for the kernel density estimation using normal kernel for the bulk distribution upto the threshold and conditional GPD above threshold and continuous at threshold. The parameters are the bandwidth lambda, threshold u GPD and shape xi and tail fraction phiu.

Usage

dkdengpdcon(x, kerncentres, lambda = NULL,
    u = as.vector(quantile(kerncentres, 0.9)), xi = 0,
    phiu = TRUE, log = FALSE)

  pkdengpdcon(q, kerncentres, lambda = NULL,
    u = as.vector(quantile(kerncentres, 0.9)), xi = 0,
    phiu = TRUE, lower.tail = TRUE)

  qkdengpdcon(p, kerncentres, lambda = NULL,
    u = as.vector(quantile(kerncentres, 0.9)), xi = 0,
    phiu = TRUE, lower.tail = TRUE)

  rkdengpdcon(n = 1, kerncentres, lambda = NULL,
    u = as.vector(quantile(kerncentres, 0.9)), xi = 0,
    phiu = TRUE)

Arguments

quantile

kerncentres

kernel centres (typically sample data)

lambda

bandwidth for normal kernel (standard deviation of normal)

threshold

shape parameter

phiu

probability of being above threshold [0,1]

log

logical, if TRUE then log density

quantile

lower.tail

logical, if FALSE then upper tail probabilities

cumulative probability

sample size (non-negative integer)

Value

dkdengpdcon gives the density, pkdengpdcon gives the cumulative distribution function, qkdengpdcon gives the quantile function and rkdengpdcon gives a random sample.

Details

Extreme value mixture model combining kernel density estimation using normal kernel for the bulk below the threshold and GPD for upper tail, with a constraint to be continuous at the threshold. The user can pre-specify phiu permitting a parameterised value for the tail fraction $\phi_u$. Alternatively, when phiu=TRUE the tail fraction is estimated as the tail fraction from the normal bulk model. The cumulative distribution function with tail fraction $\phi_u$ defined by the upper tail fraction of the kernel density estimation using normal kernel (phiu=TRUE), upto the threshold $x \le u$, given by: $$F(x) = H(x)$$ and above the threshold $x > u$: $$F(x) = H(u) + [1 - H(u)] G(x)$$ where $H(x)$ and $G(X)$ are the kernel and conditional GPD cumulative distribution functions (i.e. mean(pnorm(x, kerncentres, lambda)) and pgpd(x, u, sigmau, xi)). The cumulative distribution function for pre-specified $\phi_u$, upto the threshold $x \le u$, is given by: $$F(x) = (1 - \phi_u) H(x)/H(u)$$ and above the threshold $x > u$: $$F(x) = \phi_u + [1 - \phi_u] G(x)$$ Notice that these definitions are equivalent when $\phi_u = 1 - mean(H(u))$. The continuity constraint means that $(1 - \phi_u) h(u)/H(u) = \phi_u g(u)$ where $h(x)$ and $g(x)$ are the KDE and conditional GPD density functions. The resulting GPD scale parameter is then: $$\sigma_u = \phi_u H(u) / [1 - \phi_u] h(u)$$. In the special case of where the tail fraction is defined by the bulk model this reduces to $$\sigma_u = [1 - H(u)] / h(u)$$. See gpd for details of GPD upper tail component and dkden for details of KDE of bulk component.

References

http://en.wikipedia.org/wiki/Normal_distribution http://en.wikipedia.org/wiki/Generalized_Pareto_distribution Scarrott, C.J. and MacDonald, A. (2012). A review of extreme value threshold estimation and uncertainty quantification. REVSTAT - Statistical Journal 10(1), 33-59. Available from http://www.ine.pt/revstat/pdf/rs120102.pdf Bowman, A.W. (1984). An alternative method of cross-validation for the smoothing of density estimates. Biometrika 71(2), 353-360. Duin, R.P.W. (1976). On the choice of smoothing parameters for Parzen estimators of probability density functions. IEEE Transactions on Computers C25(11), 1175-1179. MacDonald, A., Scarrott, C.J., Lee, D., Darlow, B., Reale, M. and Russell, G. (2011). A flexible extreme value mixture model. Computational Statistics and Data Analysis 55(6), 2137-2157.

Examples

Run this code

par(mfrow=c(2,2))
kerncentres=rnorm(500, 0, 1)
xx = seq(-4, 4, 0.01)
hist(kerncentres, breaks = 100, freq = FALSE)
lines(xx, dkdengpdcon(xx, kerncentres, u = 1.2, xi = 0.1))

plot(xx, pkdengpdcon(xx, kerncentres), type = "l")
lines(xx, pkdengpdcon(xx, kerncentres, xi = 0.3), col = "red")
lines(xx, pkdengpdcon(xx, kerncentres, xi = -0.3), col = "blue")
legend("topleft", paste("xi =",c(0, 0.3, -0.3)),
      col=c("black", "red", "blue"), lty = 1, cex = 0.5)

kerncentres=rnorm(1000, 0, 1)
x = rkdengpdcon(1000, kerncentres, phiu = 0.1, u = 1.2, xi = 0.1)
xx = seq(-4, 6, 0.01)
hist(x, breaks = 100, freq = FALSE, xlim = c(-4, 6))
lines(xx, dkdengpdcon(xx, kerncentres, phiu = 0.1))

plot(xx, dkdengpdcon(xx, kerncentres, xi=0, phiu = 0.2), type = "l")
lines(xx, dkdengpdcon(xx, kerncentres, xi=-0.2, phiu = 0.2), col = "red")
lines(xx, dkdengpdcon(xx, kerncentres, xi=0.2, phiu = 0.2), col = "blue")
legend("topleft", c("xi = 0", "xi = 0.2", "xi = -0.2"),
      col=c("black", "red", "blue"), lty = 1)

Run the code above in your browser using DataLab