Learn R Programming

evmix (version 1.0)

fbckdengpd: Cross-validation MLE Fitting of Boundary Corrected Kernel Density Estimation and GPD Tail Extreme Value Mixture Model

Description

Maximum likelihood estimation for fitting boundary corrected kernel density estimators for the bulk and GPD tail extreme value mixture model

Usage

fbckdengpd(x, phiu = TRUE, pvector = NULL,
    add.jitter = FALSE, factor = 0.1, amount = NULL,
    bcmethod = "simple", proper = TRUE, nn = "jf96",
    offset = 0, xmax = Inf, std.err = TRUE,
    method = "BFGS", control = list(maxit = 10000),
    finitelik = TRUE, ...)

Arguments

x
quantile
bcmethod
boundary correction approach
proper
logical, should density be renormalised to integrate to unity, simple boundary correction only
nn
non-negativity correction, so simple boundary correction only
offset
offset added to kernel centres, for logtrans
xmax
upper bound on support, for copula and beta kernels only
phiu
logical
pvector
vector of initial values of mixture model parameters (nmean, nsd, u, sigmau, xi) or NULL
add.jitter
logical, whether jitter is needed for rounded data
factor
see jitter
amount
see jitter
std.err
logical, should standard errors be calculated
method
optimisation method (see optim)
control
optimisation control list (see optim)
finitelik
logical, should log-likelihood return finite value for invalid parameters
...
optional inputs passed to optim

Value

  • Returns a simple list with the following elements ll{ call: optim call x: (jittered) data vector x kerncentres: actual kernel centres used x init: pvector optim: complete optim output mle: vector of MLE of parameters cov: variance of MLE parameters se: standard error of MLE parameters nllh: minimum negative cross-validation log-likelihood allparams: vector of MLE of model parameters, including phiu allse: vector of standard error of all parameters, including phiu n: total sample size lambda: MLE of bandwidth u: threshold sigmau: MLE of GPD scale xi: MLE of GPD shape phiu: MLE of tail fraction bcmethod: boundary correction method proper: logical, whether renormalisation is requested nn: non-negative correction method offset: offset for log transformation method xmax: maximum value of scale beta or copula } The output list has some duplicate entries and repeats some of the inputs to both provide similar items to those from fpot and to make it as useable as possible.

Warning

Two important practical issues arise with MLE for the kernel bandwidth: 1) Cross-validation likelihood is needed for the KDE bandwidth parameter as the usual likelihood degenerates, so that the MLE $\hat{\lambda} \rightarrow 0$ as $n \rightarrow \infty$, thus giving a negative bias towards a small bandwidth. Leave one out cross-validation essentially ensures that some smoothing between the kernel centres is required (i.e. a non-zero bandwidth), otherwise the resultant density estimates would always be zero if the bandwidth was zero. This problem occassionally rears its ugly head for data which has been heavily rounded, as even when using cross-validation the density can be non-zero even if the bandwidth is zero. To overcome this issue an option to add a small jitter should be added to the data (x only) has been included in the fitting inputs, using the jitter function, to remove the ties. The default options red in the jitter are specified above, but the user can override these. Notice the default scaling factor=0.1, which is a tenth of the default value in the jitter function itself. A warning message is given if the data appear to be rounded (i.e. more than 5 estimated bandwidth is too small, then data rounding is the likely culprit. Only use the jittering when the MLE of the bandwidth is far too small.

Details

Extreme value mixture model combining boundary corrected kernel density estimators for the bulk below the threshold and GPD for upper tail is fitted to the entire dataset using maximum cross-validation likelihood estimation. The estimated parameters, their variance and standard error are automatically output. Cross-validation likelihood is used for boundary corrected kernel density component, but standard likelihood is used for GPD component. The default value for phiu=TRUE so that the tail fraction is specified by boundary corrected kernel density estimators cumulative distribution $\phi_u = 1 - H(u)$. When phiu=FALSE then the tail fraction is treated as an extra parameter estimated using the MLE which is the sample proportion above the threshold. In this case the standard error for phiu is estimated and output as sephiu. Missing values (NA and NaN) are assumed to be invalid data so are ignored, which is inconsistent with the evd library which assumes the missing values are below the threshold. The default optimisation algorithm is "BFGS", which requires a finite negative log-likelihood function evaluation finitelik=TRUE. For invalid parameters, a zero likelihood is replaced with exp(-1e6). The "BFGS" optimisation algorithms require finite values for likelihood, so any user input for finitelik will be overridden and set to finitelik=TRUE if either of these optimisation methods is chosen. It will display a warning for non-zero convergence result comes from optim function call. If the hessian is of reduced rank then the variance (from inverse hessian) and standard error of bandwidth parameter cannot be calculated, then by default std.err=TRUE and the function will stop. If you want the bandwidth estimate even if the hessian is of reduced rank (e.g. in a simulation study) then set std.err=FALSE.

References

http://en.wikipedia.org/wiki/Kernel_density_estimation http://en.wikipedia.org/wiki/Cross-validation_(statistics) Scarrott, C.J. and MacDonald, A. (2012). A review of extreme value threshold estimation and uncertainty quantification. REVSTAT - Statistical Journal 10(1), 33-59. Available from http://www.ine.pt/revstat/pdf/rs120102.pdf Bowman, A.W. (1984). An alternative method of cross-validation for the smoothing of density estimates. Biometrika 71(2), 353-360. Duin, R.P.W. (1976). On the choice of smoothing parameters for Parzen estimators of probability density functions. IEEE Transactions on Computers C25(11), 1175-1179. MacDonald, A., Scarrott, C.J., Lee, D., Darlow, B., Reale, M. and Russell, G. (2011). A flexible extreme value mixture model. Computational Statistics and Data Analysis 55(6), 2137-2157. MacDonald, A., C. J. Scarrott, and D. S. Lee (2011). Boundary correction, consistency and robustness of kernel densities using extreme value theory. Submitted. Available from: http://www.math.canterbury.ac.nz/~c.scarrott.

See Also

fkden, jitter, density and bw.nrd0

Examples

Run this code
xx = seq(0.1, 10, 0.01)
x = rgamma(500, 2, 1)
pinit = c(0.1, quantile(x, 0.9), 1, 0.1)
fit = fbckdengpd(x, phiu = FALSE, pvector = pinit, std.err = FALSE, bcmethod = "reflect")
hist(x, 100, freq = FALSE,ylim=c(0,0.6))
lines(xx, dbckdengpd(xx, x, fit$lambda, fit$u, fit$sigmau, fit$xi,
  fit$phiu, bcmethod = "reflect"), col="blue")
abline(v = fit$u)

Run the code above in your browser using DataLab