gamGPDfitboot: Smooth Parameter Estimation and Bootstrapping of Generalized Pareto Distributions with Penalized Maximum Likelihood Estimation

Description

gamGPDfit() fits the parameters of a generalized Pareto distribution (GPD) depending on covariates in a non- or semiparametric way. gamGPDboot() fits and bootstraps the parameters of a GPD distribution depending on covariates in a non- or semiparametric way. Applies the post-blackend bootstrap of Chavez-Demoulin and Davison (2005).

Usage

gamGPDfit(x, threshold, nexceed=NULL, datvar, xiFrhs, nuFrhs,
    init=gpd.fit(x[, datvar], threshold=threshold, show=FALSE)$mle[2:1],
    niter=128, epsxi=1e-05, epsnu=1e-05,
    progress=TRUE, verbose=FALSE, ...)
  gamGPDboot(x, B, threshold, nexceed=NULL, datvar, xiFrhs, nuFrhs,
             init=gpd.fit(x[, datvar], threshold=threshold, show=FALSE)$mle[2:1],
             niter=128, epsxi=1e-5, epsnu=1e-5,
             boot.progress=TRUE, progress=FALSE, verbose=FALSE, debug=FALSE, ...)

Arguments

data.frame containing the losses (in some component; can be specified with the argument datvar; the other components contain the covariates).

number of bootstrap replications.

threshold

threshold of the peaks-over-threshold (POT) method. Above this value, loss (exceedances) are considered for the model.

nexceed

number of exceedances. This can be used to determine

datvar

name of the data column in x which contains the the data to be modeled.

xiFrhs

right-hand side of the formula for $\xi$ in the gam() call for fitting $\xi$.

nuFrhs

right-hand side of the formula for $\nu$ in the gam() call for fitting $\nu$.

init

bivariate vector containing initial values for $(\xi, \beta)$.

niter

maximal number of iterations in the backfitting algorithm.

epsxi

epsilon for stop criterion for $\xi$.

epsnu

epsilon for stop criterion for $\nu$.

boot.progress

logical indicating whether progress information about gamGPDboot() is displayed.

progress

logical indicating whether progress information about gamGPDfit() is displayed. For gamGPDboot(), progress is only passed to gamGPDfit() if

verbose

logical indicating whether additional information (in case of undesired behavior) is printed. For gamGPDboot(), progress is only passed to gamGPDfit() if

debug

logical indicating whether initial fit (before the bootstrap is initiated) is saved.

...

additional arguments passed to gam() (which is called internally; see the source code of gamGPDfitUp()).

Value

gamGPDfit() returns a list with the components [object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object] gamGPDboot() returns a list with components names as for gamGPDfit(). However, all components now contain the results of the initial fit via gamGPDfit() in their first components and the results for each replication of the post-blackend bootstrap in the remaining components. So, for example, xi is now a matrix whose first column contains the fitted $\xi$, and the remaining columns contain the bootstrapped ones. Note that in contrast to gamGPDfit(), return values are now vectors, vectors are now matrices, and so on. Since the covariates remain the same, covar is just the same as for gamGPDfit().

Details

The function gamGPDfit() fits the parameters $\xi$ and $\beta$ of the generalized Pareto distribution $\mathrm{GPD}(\xi,\beta)$ depending on covariates in a non- or semiparametric way. The distribution function is given by $$G_{\xi,\beta}(x)=1-(1+\xi x/\beta)^{-1/\xi},\quad x\ge0,$$ for $\xi>0$ (which is what we assume) and $\beta>0$. Note that $\beta$ is also denoted by $\sigma$ in this package. Estimation of $\xi$ and $\beta$ by gamGPDfit() is done via penalized maximum likelihood estimation, where the estimators are computed with a backfitting algorithm. In order to guarantee convergence of this algorithm, a reparameterization of $\beta$ in terms of the parameter $\nu$ is done via $$\beta=\exp(\nu)/(1+\xi).$$ The parameters $\xi$ and $\nu$ (and thus $\beta$) are allowed to depend on covariates (including time) in a non- or semiparametric way, for example: $$\xi=\xi(\bm{x},t)=\bm{x}^{\top}\bm{\alpha}_{\xi}+h_{\xi}(t),$$ $$\nu=\nu(\bm{x},t)=\bm{x}^{\top}\bm{\alpha}_{\nu}+h_{\nu}(t),$$ where $\bm{x}$ denotes the vector of covariates, $\bm{\alpha}_{\xi}$, $\bm{\alpha}_{\nu}$ are parameter vectors and $h_{\xi}$, $h_{\nu}$ are regression splines. For more details, see the references and the source code. The function gamGPDboot() first fits the GPD parameters via gamGPDfit(). It then conducts the post-blackend bootstrap of Chavez-Demoulin and Davison (2005). To this end, it computes the residuals, resamples them (B times), reconstructs the corresponding exceedances, and refits the GPD parameters via gamGPDfit() again.

References

Chavez-Demoulin, V., and Davison, A. C. (2005), Generalized additive models for sample extremes, Applied Statistics 54(1), 207--222. Chavez-Demoulin, V., and Hofert, M. (to be submitted), Smooth extremal models fitted by penalized maximum likelihood estimation.

Examples

Run this code

### Example 1: fitting capability ##############################################

## generate an example data set
years <- 2003:2012 # years
nyears <- length(years)
n <- 250 # sample size for each (different) xi
u <- 200 # threshold
rGPD <- function(n, xi, beta) ((1-runif(n))^(-xi)-1)*beta/xi # sampling GPD

set.seed(17) # setting seed
xi.true.A <- seq(0.4, 0.8, length=nyears) # true xi for group "A"
## generate losses for group "A"
lossA <- unlist(lapply(1:nyears,
                       function(y) u + rGPD(n, xi=xi.true.A[y], beta=1)))
xi.true.B <- xi.true.A^2 # true xi for group "B"
## generate losses for group "B"
lossB <- unlist(lapply(1:nyears,
                       function(y) u + rGPD(n, xi=xi.true.B[y], beta=1)))
## build data frame
time <- rep(rep(years, each=n), 2) # "2" stands for the two groups
covar <- rep(c("A","B"), each=n*nyears)
value <- c(lossA, lossB)
x <- data.frame(covar=covar, time=time, value=value)

## fit
eps <- 1e-3 # to decrease the run time for this example
fit <- gamGPDfit(x, threshold=u, datvar="value", xiFrhs=~covar+s(time)-1,
                 nuFrhs=~covar+s(time)-1, epsxi=eps, epsnu=eps)
## note: choosing s(..., bs="cr") will fit cubic splines

## grab the fitted values per group and year
xi.fit <- fitted(fit$xiObj)
xi.fit. <- xi.fit[1+(0:(2*nyears-1))*n] # pick fit for each group and year
xi.fit.A <- xi.fit.[1:nyears] # fit for "A" and each year
xi.fit.B <- xi.fit.[(nyears+1):(2*nyears)] # fit for "B" and each year

## plot the fitted values of xi and the true ones we simulated from
par(mfrow=c(1,2))
plot(years, xi.true.A, type="l", ylim=range(xi.true.A, xi.fit.A),
     main="Group A", xlab="Year", ylab=expression(xi))
points(years, xi.fit.A, type="l", col="red")
legend("topleft", inset=0.04, lty=1, col=c("black", "red"),
       legend=c("true", "fitted"), bty="n")
plot(years, xi.true.B, type="l", ylim=range(xi.true.B, xi.fit.B),
     main="Group B", xlab="Year", ylab=expression(xi))
points(years, xi.fit.B, type="l", col="blue")
legend("topleft", inset=0.04, lty=1, col=c("black", "blue"),
       legend=c("true", "fitted"), bty="n")

### Example 2: Comparison of (the more general) gamGPDfit() with gpd.fit() ########

set.seed(17) # setting seed
xi.true.A <- rep(0.4, length=nyears)
xi.true.B <- rep(0.8, length=nyears)
## generate losses for group "A"
lossA <- unlist(lapply(1:nyears,
                       function(y) u + rGPD(n, xi=xi.true.A[y], beta=1)))
## generate losses for group "B"
lossB <- unlist(lapply(1:nyears,
                       function(y) u + rGPD(n, xi=xi.true.B[y], beta=1)))
## build data frame
x <- data.frame(covar=covar, time=time, value=c(lossA, lossB))

## fit with gpd.fit
fit.coles <- gpd.fit(x$value, threshold=u, shl=1, sigl=1, ydat=x)
xi.fit.coles.A <- fit.coles$mle[3]+1*fit.coles$mle[4]
xi.fit.coles.B <- fit.coles$mle[3]+2*fit.coles$mle[4]

## fit with gamGPDfit()
fit <- gamGPDfit(x, threshold=u, datvar="value", xiFrhs=~covar, nuFrhs=~covar,
                 epsxi=eps, epsnu=eps)
xi.fit <- fitted(fit$xiObj)
xi.fit.A <- as.numeric(xi.fit[1]) # fit for group "A"
xi.fit.B <- as.numeric(xi.fit[nyears*n+1]) # fit for group "B"

## comparison
xi.fit.A-xi.fit.coles.A
xi.fit.B-xi.fit.coles.B # dontrun

Run the code above in your browser using DataLab