Maximum likelihood estimation for fitting the extreme value mixture model with P-splines density estimate for bulk distribution upto the threshold and conditional GPD above threshold. With options for profile likelihood estimation for threshold and fixed threshold approach.
fpsdengpd(x, phiu = TRUE, useq = NULL, fixedu = FALSE,
pvector = NULL, lambdaseq = NULL, breaks = NULL, xrange = NULL,
nseg = 10, degree = 3, design.knots = NULL, ord = 2,
std.err = TRUE, method = "BFGS", control = list(maxit = 10000),
finitelik = TRUE, ...)lpsdengpd(x, psdenx, u = NULL, sigmau = NULL, xi = 0, phiu = TRUE,
bsplinefit = NULL, phib = NULL, log = TRUE)
nlpsdengpd(pvector, x, psdenx, phiu = TRUE, bsplinefit, phib = NULL,
finitelik = FALSE)
proflupsdengpd(u, pvector, x, psdenx, phiu = TRUE, bsplinefit,
method = "BFGS", control = list(maxit = 10000), finitelik = TRUE,
...)
nlupsdengpd(pvector, u, x, psdenx, phiu = TRUE,
bsplinefit = bsplinefit, phib = NULL, finitelik = FALSE)
vector of sample data
probability of being above threshold fnormgpd
vector of thresholds (or scalar) to be considered in profile likelihood or
NULL
for no profile likelihood
logical, should threshold be fixed (at either scalar value in useq
,
or estimated from maximum of profile likelihood evaluated at
sequence of thresholds in useq
)
vector of initial values of parameters or NULL
for default
values, see below
vector of
histogram breaks (as in hist
function)
vector of minimum and maximum of B-spline (support of density)
number of segments between knots
degree of B-splines (0 is constant, 1 is linear, etc.)
spline knots for splineDesign function
order of difference used in the penalty term
logical, should standard errors be calculated
optimisation method (see optim
)
optimisation control list (see optim
)
logical, should log-likelihood return finite value for invalid parameters
optional inputs passed to optim
P-splines based density estimate for each datapoint in x
scalar threshold value
scalar scale parameter (positive)
scalar shape parameter
list output from P-splines density fitting fpsden
function
renormalisation constant for bulk model density 1-phiu
logical, if TRUE
then log-likelihood rather than likelihood is output
Log-likelihood is given by lpsdengpd
and it's
wrappers for negative log-likelihood from nlpsdengpd
and nlupsdengpd
. Profile likelihood for single
threshold given by proflupsdengpd
. Fitting function
fpsdengpd
returns a simple list with the
following elements
call : |
optim call |
x : |
data vector x |
init : |
pvector |
fixedu : |
fixed threshold, logical |
useq : |
threshold vector for profile likelihood or scalar for fixed threshold |
nllhuseq : |
profile negative log-likelihood at each threshold in useq |
bsplinefit : |
complete fpsden output |
psdenx : |
P-splines based density estimate for each datapoint in x |
xrange : |
range of support of B-splines |
degree : |
degree of B-splines |
nseg : |
number of internal segments |
design.knots : |
knots used in splineDesign |
nbinwidth : |
scaling factor to convert counts to density |
optim : |
complete optim output |
conv : |
indicator for "possible" convergence |
mle : |
vector of MLE of (GPD and threshold, if relevant) parameters |
cov : |
variance-covariance matrix of MLE of parameters |
se : |
vector of standard errors of MLE of parameters |
rate : |
phiu to be consistent with evd |
nllh : |
minimum negative log-likelihood |
n : |
total sample size |
beta : |
vector of MLE of B-spline coefficients |
lambda : |
Estimated or fixed |
u : |
threshold (fixed or MLE) |
sigmau : |
MLE of GPD scale |
xi : |
MLE of GPD shape |
phiu : |
MLE of tail fraction (bulk model or parameterised approach) |
se.phiu : |
standard error of MLE of tail fraction |
See Acknowledgments in
fnormgpd
, type help fnormgpd
.
The Poisson regression and leave-one-out cross-validation functions are based on the code of Eilers and Marx (1996) available from Brian Marx's website http://statweb.lsu.edu/faculty/marx/, which is gratefully acknowledged.
The extreme value mixture model with P-splines density estimate for bulk and GPD tail is
fitted to the entire dataset. A two-stage maximum likelihood inference approach is taken. The first
stage consists fitting of the P-spline density estimator, which is acheived by MLE using the
fpsden
function. The second stage, conditions on the B-spline coefficients,
using MLE for the extreme value mixture model (GPD parameters and threshold, if requested). The estimated
parameters, variance-covariance matrix and their standard errors are automatically
output.
See help for fnormgpd
for details of extreme value mixture models,
type help fnormgpd
. Only the different features are outlined below for brevity.
As the second stage conditions on the Bs-pline coefficients, the full parameter vector is
(u
, sigmau
, xi
) if threshold is also estimated and
(sigmau
, xi
) for profile likelihood or fixed threshold approach.
(Penalized) MLE estimation of the B-Spline coefficients is carried out using Poisson regression
based on histogram bin counts. See help for fpsden
for details,
type help fpsden
.
http://www.math.canterbury.ac.nz/~c.scarrott/evmix
http://en.wikipedia.org/wiki/Generalized_Pareto_distribution
http://en.wikipedia.org/wiki/Cross-validation_(statistics)
http://en.wikipedia.org/wiki/B-spline
http://statweb.lsu.edu/faculty/marx/
Eilers, P.H.C. and Marx, B.D. (1996). Flexible smoothing with B-splines and penalties. Statistical Science 11(2), 89-121.
Scarrott, C.J. and MacDonald, A. (2012). A review of extreme value threshold estimation and uncertainty quantification. REVSTAT - Statistical Journal 10(1), 33-59. Available from http://www.ine.pt/revstat/pdf/rs120102.pdf
fpsden
, fnormgpd
,
fgpd
and gpd
Other psden: fpsden
, psdengpd
,
psden
Other psdengpd: psdengpd
, psden
Other fpsdengpd: psdengpd
# NOT RUN {
set.seed(1)
par(mfrow = c(1, 1))
x = rnorm(1000)
xx = seq(-4, 4, 0.01)
y = dnorm(xx)
# Plenty of histogram bins (100)
breaks = seq(-4, 4, length.out=101)
# P-spline fitting with cubic B-splines, 2nd order penalty and 10 internal segments
# CV search for penalty coefficient.
fit = fpsdengpd(x, useq = seq(0, 3, 0.1), fixedu = TRUE,
lambdaseq = 10^seq(-5, 5, 0.25), breaks = breaks,
xrange = c(-4, 4), nseg = 10, degree = 3, ord = 2)
hist(x, freq = FALSE, breaks = breaks, xlim = c(-6, 6))
lines(xx, y, col = "black") # true density
# P-splines+GPD
with(fit, lines(xx, dpsdengpd(xx, beta, nbinwidth,
u = u, sigmau = sigmau, xi = xi, design = design.knots),
lwd = 2, col = "red"))
abline(v = fit$u, col = "red", lwd = 2, lty = 3)
# P-splines density estimate
with(fit, lines(xx, dpsden(xx, beta, nbinwidth, design = design.knots),
lwd = 2, col = "blue", lty = 2))
# vertical lines for all knots
with(fit, abline(v = design.knots, col = "red"))
# internal knots
with(fit, abline(v = design.knots[(degree + 2):(length(design.knots) - degree - 1)], col = "blue"))
# boundary knots (support of B-splines)
with(fit, abline(v = design.knots[c(degree + 1, length(design.knots) - degree)], col = "green"))
legend("topright", c("True Density","P-spline density","P-spline+GPD"),
col=c("black", "blue", "red"), lty = c(1, 2, 1))
legend("topleft", c("Internal Knots", "Boundaries", "Extra Knots", "Threshold"),
col=c("blue", "green", "red", "red"), lty = c(1, 1, 1, 2))
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab