flognormgpdcon: MLE Fitting of Log-Normal Bulk and GPD Tail Extreme Value Mixture Model with Continuity Constraint

Description

Maximum likelihood estimation for fitting the extreme value mixture model with normal for bulk distribution upto the threshold and conditional GPD above threshold with a continuity constraint

Usage

flognormgpdcon(x, phiu = TRUE, pvector = NULL,
    std.err = TRUE, method = "BFGS",
    control = list(maxit = 10000), finitelik = TRUE, ...)

Arguments

pvector

vector of initial values of mixture model parameters (lnmean, lnsd, u, xi) or NULL

vector of sample data

phiu

logical

std.err

logical, should standard errors be calculated

method

optimisation method (see optim)

control

optimisation control list (see optim)

finitelik

logical, should log-likelihood return finite value for invalid parameters

...

optional inputs passed to optim

Value

Returns a simple list with the following elements ll{ call: optim call x: data vector x init: pvector optim: complete optim output mle: vector of MLE of model parameters cov: variance-covariance matrix of MLE of model parameters se: vector of standard errors of MLE of model parameters rate: phiu to be consistent with evd nllh: minimum negative log-likelihood allparams: vector of MLE of model parameters, including sigmau and phiu allse: vector of standard error of all parameters, including sigmau and phiu n: total sample size nmean: MLE of log-normal mean nsd: MLE of log-normal standard deviation u: threshold sigmau: MLE of GPD scale xi: MLE of GPD shape phiu: MLE of tail fraction } The output list has some duplicate entries and repeats some of the inputs to both provide similar items to those from fpot and to make it as useable as possible.

Details

The extreme value mixture model with log-normal bulk and GPD tail with continuity constraint is fitted to the entire dataset using maximum likelihood estimation. The estimated parameters, variance-covariance matrix and their standard errors are automatically output. Negative data are ignored. The default value for phiu=TRUE so that the tail fraction is specified by normal distribution $\phi_u = 1 - H(u)$. When phiu=FALSE then the tail fraction is treated as an extra parameter estimated using the MLE which is the sample proportion above the threshold. In this case the standard error for phiu is estimated and output as sephiu. Missing values (NA and NaN) are assumed to be invalid data so are ignored, which is inconsistent with the evd library which assumes the missing values are below the threshold. The default optimisation algorithm is "BFGS", which requires a finite negative log-likelihood function evaluation finitelik=TRUE. For invalid parameters, a zero likelihood is replaced with exp(-1e6). The "BFGS" optimisation algorithms require finite values for likelihood, so any user input for finitelik will be overridden and set to finitelik=TRUE if either of these optimisation methods is chosen. It will display a warning for non-zero convergence result comes from optim function call. If the hessian is of reduced rank then the variance covariance (from inverse hessian) and standard error of parameters cannot be calculated, then by default std.err=TRUE and the function will stop. If you want the parameter estimates even if the hessian is of reduced rank (e.g. in a simulation study) then set std.err=FALSE.

References

http://en.wikipedia.org/wiki/Log-normal_distribution http://en.wikipedia.org/wiki/Generalized_Pareto_distribution Scarrott, C.J. and MacDonald, A. (2012). A review of extreme value threshold estimation and uncertainty quantification. REVSTAT - Statistical Journal 10(1), 33-59. Available from http://www.ine.pt/revstat/pdf/rs120102.pdf Solari, S. and Losada, M.A. (2004). A unified statistical model for hydrological variables including the selection of threshold for the peak over threshold method. Water Resources Research. 48, W10541.

Examples

Run this code

par(mfrow=c(2,1))
x = rlnorm(1000)
xx = seq(-1, 6, 0.01)
y = dlnorm(xx)

# Bulk model base tail fraction
fit = flognormgpdcon(x, phiu = TRUE, std.err = FALSE)
hist(x, breaks = 100, freq = FALSE, xlim = c(-1, 6))
lines(xx, y)
lines(xx, dlognormgpdcon(xx, lnmean = fit$lnmean, lnsd = fit$lnsd, u = fit$u,
  xi = fit$xi, phiu = TRUE), col="red")
abline(v = fit$u)

# Parameterised tail fraction
fit2 = flognormgpdcon(x, phiu = FALSE, std.err = FALSE)
plot(xx, y, type = "l")
lines(xx, dlognormgpdcon(xx, lnmean = fit$lnmean, lnsd = fit$lnsd, u = fit$u,
  xi = fit$xi, phiu = TRUE), col="red")
lines(xx, dlognormgpdcon(xx, lnmean = fit2$lnmean, lnsd = fit2$lnsd, u = fit2$u,
  xi = fit2$xi, phiu = fit2$phiu), col="blue")
abline(v = fit$u, col = "red")
abline(v = fit2$u, col = "blue")
legend("topright", c("True Density","Bulk Tail Fraction","Parameterised Tail Fraction"),
  col=c("black", "red", "blue"), lty = 1)

Run the code above in your browser using DataLab