fpsdengpd: MLE Fitting of P-splines Density Estimate for Bulk and GPD Tail Extreme Value Mixture Model

Description

Maximum likelihood estimation for fitting the extreme value mixture model with P-splines density estimate for bulk distribution upto the threshold and conditional GPD above threshold. With options for profile likelihood estimation for threshold and fixed threshold approach.

Usage

fpsdengpd(x, phiu = TRUE, useq = NULL, fixedu = FALSE,
  pvector = NULL, lambdaseq = NULL, breaks = NULL, xrange = NULL,
  nseg = 10, degree = 3, design.knots = NULL, ord = 2,
  std.err = TRUE, method = "BFGS", control = list(maxit = 10000),
  finitelik = TRUE, ...)
lpsdengpd(x, psdenx, u = NULL, sigmau = NULL, xi = 0, phiu = TRUE,
  bsplinefit = NULL, phib = NULL, log = TRUE)
nlpsdengpd(pvector, x, psdenx, phiu = TRUE, bsplinefit, phib = NULL,
  finitelik = FALSE)
proflupsdengpd(u, pvector, x, psdenx, phiu = TRUE, bsplinefit,
  method = "BFGS", control = list(maxit = 10000), finitelik = TRUE,
  ...)
nlupsdengpd(pvector, u, x, psdenx, phiu = TRUE,
  bsplinefit = bsplinefit, phib = NULL, finitelik = FALSE)

Arguments

vector of sample data

phiu

probability of being above threshold \((0, 1)\) or logical, see Details in help for fnormgpd

useq

vector of thresholds (or scalar) to be considered in profile likelihood or NULL for no profile likelihood

fixedu

logical, should threshold be fixed (at either scalar value in useq, or estimated from maximum of profile likelihood evaluated at sequence of thresholds in useq)

pvector

vector of initial values of parameters or NULL for default values, see below

lambdaseq

vector of \(\lambda\)'s (or scalar) to be considered in profile likelihood. Required.

breaks

histogram breaks (as in hist function)

xrange

vector of minimum and maximum of B-spline (support of density)

nseg

number of segments between knots

degree

degree of B-splines (0 is constant, 1 is linear, etc.)

design.knots

spline knots for splineDesign function

ord

order of difference used in the penalty term

std.err

logical, should standard errors be calculated

method

optimisation method (see optim)

control

optimisation control list (see optim)

finitelik

logical, should log-likelihood return finite value for invalid parameters

...

optional inputs passed to optim

psdenx

P-splines based density estimate for each datapoint in x

scalar threshold value

sigmau

scalar scale parameter (positive)

scalar shape parameter

bsplinefit

list output from P-splines density fitting fpsden function

phib

renormalisation constant for bulk model density \((1-\phi_u)/H(u)\), to make it integrate to 1-phiu

log

logical, if TRUE then log-likelihood rather than likelihood is output

Value

Log-likelihood is given by lpsdengpd and it's wrappers for negative log-likelihood from nlpsdengpd and nlupsdengpd. Profile likelihood for single threshold given by proflupsdengpd. Fitting function fpsdengpd returns a simple list with the following elements

`call`:	`optim` call
`x`:	data vector `x`
`init`:	`pvector`
`fixedu`:	fixed threshold, logical
`useq`:	threshold vector for profile likelihood or scalar for fixed threshold
`nllhuseq`:	profile negative log-likelihood at each threshold in useq
`bsplinefit`:	complete `fpsden` output
`psdenx`:	P-splines based density estimate for each datapoint in `x`
`xrange`:	range of support of B-splines
`degree`:	degree of B-splines
`nseg`:	number of internal segments
`design.knots`:	knots used in `splineDesign`
`nbinwidth`:	scaling factor to convert counts to density
`optim`:	complete `optim` output
`conv`:	indicator for "possible" convergence
`mle`:	vector of MLE of (GPD and threshold, if relevant) parameters
`cov`:	variance-covariance matrix of MLE of parameters
`se`:	vector of standard errors of MLE of parameters
`rate`:	`phiu` to be consistent with `evd`
`nllh`:	minimum negative log-likelihood
`n`:	total sample size
`beta`:	vector of MLE of B-spline coefficients
`lambda`:	Estimated or fixed \(\lambda\)
`u`:	threshold (fixed or MLE)
`sigmau`:	MLE of GPD scale
`xi`:	MLE of GPD shape
`phiu`:	MLE of tail fraction (bulk model or parameterised approach)
`se.phiu`:	standard error of MLE of tail fraction

Acknowledgments

See Acknowledgments in fnormgpd, type help fnormgpd.

The Poisson regression and leave-one-out cross-validation functions are based on the code of Eilers and Marx (1996) available from Brian Marx's website http://statweb.lsu.edu/faculty/marx/, which is gratefully acknowledged.

Details

The extreme value mixture model with P-splines density estimate for bulk and GPD tail is fitted to the entire dataset. A two-stage maximum likelihood inference approach is taken. The first stage consists fitting of the P-spline density estimator, which is acheived by MLE using the fpsden function. The second stage, conditions on the B-spline coefficients, using MLE for the extreme value mixture model (GPD parameters and threshold, if requested). The estimated parameters, variance-covariance matrix and their standard errors are automatically output.

See help for fnormgpd for details of extreme value mixture models, type help fnormgpd. Only the different features are outlined below for brevity.

As the second stage conditions on the Bs-pline coefficients, the full parameter vector is (u, sigmau, xi) if threshold is also estimated and (sigmau, xi) for profile likelihood or fixed threshold approach.

(Penalized) MLE estimation of the B-Spline coefficients is carried out using Poisson regression based on histogram bin counts. See help for fpsden for details, type help fpsden.

References

http://www.math.canterbury.ac.nz/~c.scarrott/evmix

http://en.wikipedia.org/wiki/Generalized_Pareto_distribution

http://en.wikipedia.org/wiki/Cross-validation_(statistics)

http://en.wikipedia.org/wiki/B-spline

http://statweb.lsu.edu/faculty/marx/

Eilers, P.H.C. and Marx, B.D. (1996). Flexible smoothing with B-splines and penalties. Statistical Science 11(2), 89-121.

Scarrott, C.J. and MacDonald, A. (2012). A review of extreme value threshold estimation and uncertainty quantification. REVSTAT - Statistical Journal 10(1), 33-59. Available from http://www.ine.pt/revstat/pdf/rs120102.pdf

Examples

Run this code

# NOT RUN {
set.seed(1)
par(mfrow = c(1, 1))

x = rnorm(1000)
xx = seq(-4, 4, 0.01)
y = dnorm(xx)

# Plenty of histogram bins (100)
breaks = seq(-4, 4, length.out=101)

# P-spline fitting with cubic B-splines, 2nd order penalty and 10 internal segments
# CV search for penalty coefficient. 
fit = fpsdengpd(x, useq = seq(0, 3, 0.1), fixedu = TRUE,
             lambdaseq = 10^seq(-5, 5, 0.25), breaks = breaks,
             xrange = c(-4, 4), nseg = 10, degree = 3, ord = 2)
             
hist(x, freq = FALSE, breaks = breaks, xlim = c(-6, 6))
lines(xx, y, col = "black") # true density

# P-splines+GPD
with(fit, lines(xx, dpsdengpd(xx, beta, nbinwidth, 
                              u = u, sigmau = sigmau, xi = xi, design = design.knots),
                lwd = 2, col = "red"))
abline(v = fit$u, col = "red", lwd = 2, lty = 3)

# P-splines density estimate
with(fit, lines(xx, dpsden(xx, beta, nbinwidth, design = design.knots),
                lwd = 2, col = "blue", lty = 2))

# vertical lines for all knots
with(fit, abline(v = design.knots, col = "red"))

# internal knots
with(fit, abline(v = design.knots[(degree + 2):(length(design.knots) - degree - 1)], col = "blue"))
  
# boundary knots (support of B-splines)
with(fit, abline(v = design.knots[c(degree + 1, length(design.knots) - degree)], col = "green"))

legend("topright", c("True Density","P-spline density","P-spline+GPD"),
  col=c("black", "blue", "red"), lty = c(1, 2, 1))
legend("topleft", c("Internal Knots", "Boundaries", "Extra Knots", "Threshold"),
  col=c("blue", "green", "red", "red"), lty = c(1, 1, 1, 2))
# }
# NOT RUN {
  
# }

Run the code above in your browser using DataLab