lgkg: Cross-validation Log-likelihood of Kernel Density Estimation for Bulk and GPD for Both Upper and Lower Tails in Extreme Value Mixture Model

Description

Cross-validation log-likelihood and negative log-likelihood for the kernel density estimation using normal kernel bulk and GPD upper and lower tails extreme value mixture model.

Usage

lgkg(x, lambda = NULL, ul = as.vector(quantile(x, 0.1)),
    sigmaul = 1, xil = 0, phiul = TRUE,
    ur = as.vector(quantile(x, 0.9)), sigmaur = 1, xir = 0,
    phiur = TRUE, log = TRUE)

  nlgkg(pvector, x, phiul = TRUE, phiur = TRUE,
    finitelik = FALSE)

Arguments

vector of sample data

phiul

logical

phiur

logical

pvector

vector of initial values of mixture model parameters (nmean, nsd, u, sigmau, xi) or NULL

finitelik

logical, should log-likelihood return finite value for invalid parameters

lambda

bandwidth for normal kernel (standard deviation of normal)

lower tail threshold

sigmaul

lower tail GPD scale parameter (non-negative)

xil

lower tail GPD shape parameter

upper tail threshold

sigmaur

upper tail GPD scale parameter (non-negative)

xir

upper tail GPD shape parameter

log

logical, if TRUE then log density

Value

lgkg gives cross-validation (log-)likelihood and nlgkg gives the negative cross-validation log-likelihood.

Warning

See warning in fkden

Details

The cross-validation likelihood functions for the extreme value mixture model with kernel density estimation using normal kernel for bulk distribution between the upper and lower thresholds with conditional GPD's for the two tails. As used in the maximum likelihood fitting function fgkg. They are designed to be used for MLE in fgkg but are available for wider usage, e.g. constructing your own extreme value mixture models. See fkdengpd, fkden and fgpd for full details. Cross-validation likelihood is used for kernel density component, but standard likelihood is used for GPD components. The cross-validation likelihood for the KDE is obtained by leaving each point out in turn, evaluating the KDE at the point left out: $$L(\lambda)\prod_{i=1}^{nb} \hat{f}_{-i}(x_i)$$ where $$\hat{f}_{-i}(x_i) = \frac{1}{(n-1)\lambda} \sum_{j=1: j\ne i}^{n} K(\frac{x_i - x_j}{\lambda})$$ is the KDE obtained when the $i$th datapoint is dropped out and then evaluated at that dropped datapoint at $x_i$. Notice that the KDE sum is indexed over all datapoints ($j=1, ..., n$, except datapoint $i$) whether they are between the thresholds or in the tails. But the likelihood product is evaluated only for those data between the thresholds ($i=1, ..., n_b$). So the $j = n_b+1, ..., n$ datapoint are extra kernel centres from the data in the tails which are used in the KDE but the likelihood is not evaluated there. Log-likelihood calculations are carried out in lgkg, which takes bandwidth in the same form as distribution functions. The negative log-likelihood is a wrapper for lgkg, designed towards making it useable for optimisation (e.g. parameters are given a vector as first input). The function lgkg carries out the calculations for the log-likelihood directly, which can be exponentiated to give actual likelihood using (log=FALSE).

References

http://en.wikipedia.org/wiki/Kernel_density_estimation http://en.wikipedia.org/wiki/Cross-validation_(statistics) Scarrott, C.J. and MacDonald, A. (2012). A review of extreme value threshold estimation and uncertainty quantification. REVSTAT - Statistical Journal 10(1), 33-59. Available from http://www.ine.pt/revstat/pdf/rs120102.pdf Bowman, A.W. (1984). An alternative method of cross-validation for the smoothing of density estimates. Biometrika 71(2), 353-360. Duin, R.P.W. (1976). On the choice of smoothing parameters for Parzen estimators of probability density functions. IEEE Transactions on Computers C25(11), 1175-1179. MacDonald, A., Scarrott, C.J., Lee, D., Darlow, B., Reale, M. and Russell, G. (2011). A flexible extreme value mixture model. Computational Statistics and Data Analysis 55(6), 2137-2157.