psdengpd: P-Splines Density Estimate and GPD Tail Extreme Value Mixture Model

Description

Density, cumulative distribution function, quantile function and random number generation for the extreme value mixture model with P-splines density estimate for bulk distribution upto the threshold and conditional GPD above threshold. The parameters are the B-spline coefficients beta (and associated features), threshold u GPD scale sigmau and shape xi and tail fraction phiu.

Usage

dpsdengpd(x, beta = NULL, nbinwidth = NULL, xrange = NULL,
  nseg = 10, degree = 3, u = NULL, sigmau = NULL, xi = 0,
  phiu = TRUE, design.knots = NULL, log = FALSE)
ppsdengpd(q, beta = NULL, nbinwidth = NULL, xrange = NULL,
  nseg = 10, degree = 3, u = NULL, sigmau = NULL, xi = 0,
  phiu = TRUE, design.knots = NULL, lower.tail = TRUE)
qpsdengpd(p, beta = NULL, nbinwidth = NULL, xrange = NULL,
  nseg = 10, degree = 3, u = NULL, sigmau = NULL, xi = 0,
  phiu = TRUE, design.knots = NULL, lower.tail = TRUE)
rpsdengpd(n = 1, beta = NULL, nbinwidth = NULL, xrange = NULL,
  nseg = 10, degree = 3, u = NULL, sigmau = NULL, xi = 0,
  phiu = TRUE, design.knots = NULL)

Arguments

quantiles

beta

vector of B-spline coefficients (required)

nbinwidth

scaling to convert count frequency into proper density

xrange

vector of minimum and maximum of B-spline (support of density)

nseg

number of segments between knots

degree

degree of B-splines (0 is constant, 1 is linear, etc.)

threshold

sigmau

scale parameter (positive)

shape parameter

phiu

probability of being above threshold $[0, 1]$ or TRUE

design.knots

spline knots for splineDesign function

log

logical, if TRUE then log density

quantiles

lower.tail

logical, if FALSE then upper tail probabilities

cumulative probabilities

sample size (positive integer)

Value

dpsdengpd gives the density, ppsdengpd gives the cumulative distribution function, qpsdengpd gives the quantile function and rpsdengpd gives a random sample.

Details

Extreme value mixture model combining P-splines density estimate for the bulk below the threshold and GPD for upper tail.

The user can pre-specify phiu permitting a parameterised value for the tail fraction $\phi_u$. Alternatively, when phiu=TRUE the tail fraction is estimated as the tail fraction from the KDE bulk model.

The cumulative distribution function with tail fraction $\phi_u$ defined by the upper tail fraction of the P-splines density estimate (phiu=TRUE), upto the threshold $x \le u$, given by: $$F(x) = H(x)$$ and above the threshold $x > u$: $$F(x) = H(u) + [1 - H(u)] G(x)$$ where $H(x)$ and $G(X)$ are the P-splines density estimate and conditional GPD cumulative distribution functions respectively.

The cumulative distribution function for pre-specified $\phi_u$, upto the threshold $x \le u$, is given by: $$F(x) = (1 - \phi_u) H(x)/H(u)$$ and above the threshold $x > u$: $$F(x) = \phi_u + [1 - \phi_u] G(x)$$ Notice that these definitions are equivalent when $\phi_u = 1 - H(u)$.

See gpd for details of GPD upper tail component. The specification of the underlying B-splines and the P-splines density estimator are discussed in the psden function help.

References

http://en.wikipedia.org/wiki/B-spline

http://statweb.lsu.edu/faculty/marx/

http://en.wikipedia.org/wiki/Generalized_Pareto_distribution

Scarrott, C.J. and MacDonald, A. (2012). A review of extreme value threshold estimation and uncertainty quantification. REVSTAT - Statistical Journal 10(1), 33-59. Available from http://www.ine.pt/revstat/pdf/rs120102.pdf

Eilers, P.H.C. and Marx, B.D. (1996). Flexible smoothing with B-splines and penalties. Statistical Science 11(2), 89-121.

Examples

Run this code

# NOT RUN {
set.seed(1)
par(mfrow = c(1, 1))

x = rnorm(1000)
xx = seq(-6, 6, 0.01)
y = dnorm(xx)

# Plenty of histogram bins (100)
breaks = seq(-4, 4, length.out=101)

# P-spline fitting with cubic B-splines, 2nd order penalty and 8 internal segments
# CV search for penalty coefficient. 
fit = fpsdengpd(x, lambdaseq = 10^seq(-5, 5, 0.25), breaks = breaks,
             xrange = c(-4, 4), nseg = 10, degree = 3, ord = 2)
hist(x, freq = FALSE, breaks = seq(-4, 4, length.out=101), xlim = c(-6, 6))

# P-splines only
with(fit, lines(xx, dpsden(xx, beta, nbinwidth, design = design.knots), lwd = 2, col = "blue"))

# P-splines+GPD
with(fit, lines(xx, dpsdengpd(xx, beta, nbinwidth, design = design.knots, 
   u = u, sigmau = sigmau, xi = xi, phiu = phiu), lwd = 2, col = "red"))
abline(v = fit$u, col = "red")

legend("topleft", c("True Density","P-spline density", "P-spline+GPD"),
 col=c("black", "blue", "red"), lty = 1)
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab