lkdengpd: Cross-validation Log-likelihood of Kernel Density Estimator Using Normal Kernel and GPD Tail Extreme Value Mixture Model

Description

Cross-validation log-likelihood and negative log-likelihood for the kernel density estimator using a normal kernels and GPD Tail Extreme Value Mixture Model.

Usage

lkdengpd(x, lambda = NULL, u = 0, sigmau = 1, xi = 0,
    phiu = TRUE, log = TRUE)

  nlkdengpd(pvector, x, phiu = TRUE, finitelik = FALSE)

Arguments

vector of sample data

phiu

logical

pvector

vector of initial values of mixture model parameters (nmean, nsd, u, sigmau, xi) or NULL

finitelik

logical, should log-likelihood return finite value for invalid parameters

lambda

bandwidth for normal kernel (standard deviation of normal)

threshold

sigmau

scale parameter (non-negative)

shape parameter

log

logical, if TRUE then log density

Value

lkdengpd gives cross-validation (log-)likelihood and nlkdengpd gives the negative cross-validation log-likelihood.

Warning

See warning in fkden

Details

The cross-validation likelihood functions for the kernel density estimator using normal kernel for the bulk below the threshold and GPD for upper tail. As used in the maximum likelihood fitting function fkdengpd. They are designed to be used for MLE in fkdengpd but are available for wider usage, e.g. constructing your own extreme value mixture models. See fkden and fgpd for full details. Cross-validation likelihood is used for kernel density component, but standard likelihood is used for GPD component. The cross-validation likelihood for the KDE is obtained by leaving each point out in turn, evaluating the KDE at the point left out: $$L(\lambda)\prod_{i=1}^{nb} \hat{f}_{-i}(x_i)$$ where $$\hat{f}_{-i}(x_i) = \frac{1}{(n-1)\lambda} \sum_{j=1: j\ne i}^{n} K(\frac{x_i - x_j}{\lambda})$$ is the KDE obtained when the $i$th datapoint is dropped out and then evaluated at that dropped datapoint at $x_i$. Notice that the KDE sum is indexed over all datapoints ($j=1, ..., n$, except datapoint $i$) whether they are below the threshold or in the upper tail. But the likelihood product is evaluated only for those data below the threshold ($i=1, ..., n_b$). So the $j = n_b+1, ..., n$ datapoints are extra kernel centres from the data in the upper tails which are used in the KDE but the likelihood is not evaluated there. Log-likelihood calculations are carried out in lkdengpd, which takes bandwidth in the same form as distribution functions. The negative log-likelihood is a wrapper for lkdengpd, designed towards making it useable for optimisation (e.g. parameters are given a vector as first input). The function lkdengpd carries out the calculations for the log-likelihood directly, which can be exponentiated to give actual likelihood using (log=FALSE).

References

http://en.wikipedia.org/wiki/Kernel_density_estimation http://en.wikipedia.org/wiki/Cross-validation_(statistics) Scarrott, C.J. and MacDonald, A. (2012). A review of extreme value threshold estimation and uncertainty quantification. REVSTAT - Statistical Journal 10(1), 33-59. Available from http://www.ine.pt/revstat/pdf/rs120102.pdf Bowman, A.W. (1984). An alternative method of cross-validation for the smoothing of density estimates. Biometrika 71(2), 353-360. Duin, R.P.W. (1976). On the choice of smoothing parameters for Parzen estimators of probability density functions. IEEE Transactions on Computers C25(11), 1175-1179. MacDonald, A., Scarrott, C.J., Lee, D., Darlow, B., Reale, M. and Russell, G. (2011). A flexible extreme value mixture model. Computational Statistics and Data Analysis 55(6), 2137-2157.