lkdengpdcon: Cross-validation Log-likelihood of Kernel Density Estimator Using Normal Kernel and GPD Tail Extreme Value Mixture Model with Single Continuity Constraint

Description

Cross-validation log-likelihood and negative log-likelihood for the kernel density estimator using normal kernel for the bulk distribution upto the threshold and conditional GPD above threshold and continuous at threshold

Usage

lkdengpdcon(x, lambda = NULL, u = 0, xi = 0, phiu = TRUE,
    log = TRUE)

  nlkdengpdcon(pvector, x, phiu = TRUE, finitelik = FALSE)

Arguments

vector of sample data

phiu

logical

pvector

vector of initial values of mixture model parameters (nmean, nsd, u, sigmau, xi) or NULL

finitelik

logical, should log-likelihood return finite value for invalid parameters

lambda

bandwidth for normal kernel (standard deviation of normal)

threshold

shape parameter

log

logical, if TRUE then log density

Value

lkdengpdcon gives cross-validation (log-)likelihood and nlkdengpdcon gives the negative cross-validation log-likelihood.

Warning

See warning in fkden

Details

The cross-validation likelihood functions for the kernel density estimator using normal kernel for the bulk below the threshold and GPD for upper tail, with a constraint to be continuous at the threshold. As used in the maximum likelihood fitting function fkdengpdcon. They are designed to be used for MLE in fkdengpdcon but are available for wider usage, e.g. constructing your own extreme value mixture models. See fkdengpd, fkden and fgpd for full details. Cross-validation likelihood is used for kernel density component, but standard likelihood is used for GPD component. The cross-validation likelihood for the KDE is obtained by leaving each point out in turn, evaluating the KDE at the point left out: $$L(\lambda)\prod_{i=1}^{nb} \hat{f}_{-i}(x_i)$$ where $$\hat{f}_{-i}(x_i) = \frac{1}{(n-1)\lambda} \sum_{j=1: j\ne i}^{n} K(\frac{x_i - x_j}{\lambda})$$ is the KDE obtained when the $i$th datapoint is dropped out and then evaluated at that dropped datapoint at $x_i$. Notice that the KDE sum is indexed over all datapoints ($j=1, ..., n$, except datapoint $i$) whether they are below the threshold or in the upper tail. But the likelihood product is evaluated only for those data below the threshold ($i=1, ..., n_b$). So the $j = n_b+1, ..., n$ datapoints are extra kernel centres from the data in the upper tails which are used in the KDE but the likelihood is not evaluated there. Log-likelihood calculations are carried out in lkdengpdcon, which takes bandwidth in the same form as distribution functions. The negative log-likelihood is a wrapper for lkdengpdcon, designed towards making it useable for optimisation (e.g. parameters are given a vector as first input). The function lkdengpdcon carries out the calculations for the log-likelihood directly, which can be exponentiated to give actual likelihood using (log=FALSE).

References

http://en.wikipedia.org/wiki/Kernel_density_estimation http://en.wikipedia.org/wiki/Cross-validation_(statistics) Scarrott, C.J. and MacDonald, A. (2012). A review of extreme value threshold estimation and uncertainty quantification. REVSTAT - Statistical Journal 10(1), 33-59. Available from http://www.ine.pt/revstat/pdf/rs120102.pdf Bowman, A.W. (1984). An alternative method of cross-validation for the smoothing of density estimates. Biometrika 71(2), 353-360. Duin, R.P.W. (1976). On the choice of smoothing parameters for Parzen estimators of probability density functions. IEEE Transactions on Computers C25(11), 1175-1179. MacDonald, A., Scarrott, C.J., Lee, D., Darlow, B., Reale, M. and Russell, G. (2011). A flexible extreme value mixture model. Computational Statistics and Data Analysis 55(6), 2137-2157.