Learn R Programming

Renext (version 1.0-0)

fRenouv: Fit a renewal ("Renouvellement") model

Description

Fit a renewal ("Renouvellement") model to data

Usage

fRenouv(x.OT,
           sumw.BOT = 1.0,
           z.H = NULL,
           block.H = NULL,
           w.BH = NULL,
           x.U = NULL,
           w.U = NULL,
           distname.y = "exponential",
           fixed.par.y = NULL,
           start.par.y = NULL,
           force.start.H = FALSE,
           numDeriv = TRUE,
           threshold,
           trans.y = NULL,
           conf.pct = c(95, 70),
           prob = NULL,
           prob.max = 0.9995,
           pred.period = NULL,
           suspend.warnings = TRUE,
           control = list(maxit = 300, fnscale = -1),
           control.H = list(maxit = 300, fnscale = -1),
           trace = 0,
           plot = TRUE,
           main = "",
           ylim = NULL,
           ...)

Arguments

x.OT
Vector giving the observations for the "Over the Threshold" part of the data. These are the true observations, and not the exceedances y = x - threshold.
sumw.BOT
Effective duration.
z.H
Vector of MAX data or historical data. These should be defined in accordance with the block.H arg: the values in z.H must represent the maxima observed during each block. Each H-block must contain at leas
block.H
Factor (or integer vector) of length length(z.H) giving the blocks in correspondence with elements of z.H. Each element of block.H will usually describe wide periods ("middle age", "XVI
w.BH
Vector of time lengths (durations) for the blocks in block.H. Must be in the same unit as sumw.OT.
x.U
A vector of block Upper or Unobserved levels. Each value corresponds to a time period with known duration given in w.U. This argument provides the information that the level in x.U was never exceeded during the period
w.U
A vector of block durations for Upper or Unobserved levels. Must be of the same length as x.U.
distname.y
Distribution name to fit to the threshold exceedances y.
fixed.par.y
Named list of known (or fixed) parameter values for the y-distribution. This is still EXPERIMENTAL.
start.par.y
Named list of parameter initial values for the y-distribution. Only used when the distribution does not belong to the list of special distributions.
force.start.H
Logical. When TRUE the values in start.par.y (which must then be correct) will be used also as starting values in the maximisation of the global likelihood : OT data and historical data. This is useful e.g. when the h
numDeriv
Logical: should the hessian be computed using the numDeriv package (value TRUE) or should it be taken from the results of optim?
threshold
The threshold above which exceedances will be fitted. Exceedances are given (up to missing values problems) by x.OT - threshold.
trans.y
Transformation of the levels before thresholding (if not NULL). This is only possible with the "exponential" value distname.y. The two allowed choices are "square" and "log"<
conf.pct
Character or numeric vector specifying the percentages for the confidence (bilateral) limits on quantiles.
prob
Vector of probabilities.
prob.max
Max value of probability for return level and confidence limits evaluations. This argument 'shortens' the default prob vector: values > prob.max in the default prob vector are omitted. Ignored when a
pred.period
A vector of "pretty" periods at which return level and probability will be evaluated and returned in the pred data.frame.
suspend.warnings
If TRUE, the warnings will be suspended during optimisation steps. This is useful when the parameters are subject to constraints as is usually the case.
control
A named list used in optim for the no-history stage (if any). Note that fnscale = -1 says that maximisation is required (not minimisation) and must not be changed!
control.H
A named list used in optim for the historical stage (if any).
trace
Level of verbosity. Value 0 prints nothing.
plot
Draw a return level plot?
main
Main title for the return level plot (defaults to empty title).
ylim
Limits for the y axis (defaults to values computed from the data).
...
Further args to be passed to RLplot. Should be removed in future versions.

Value

  • A list containing the results.
  • est.NEstimate(s) for the count "N" part. This estimate does not use the historical data, even if is available.
  • est.yEstimate(s) for the exceedance "y" part. This estimate does not use the historical data, even if available.
  • cov.N, cov.yare (co-)variances for the estimates above.
  • estimateEstimate(s) for the whole set of parameters based on OT data and on historical data if available.
  • ks.testKolmogorov-Smirnov goodness-of-fit test.
  • ret.levA data.frame containing return levels and confidence limits. The corresponding probabilities are either provided by user or taken as default values.
  • predA data.frame similar to ret.lev, but with "pretty" return periods. These are taken as the provided values pred.period if any or are chosen as "round" multiples of the time unit (taken from sumw.BOT). The periods are chosen in order to cover periods ranging from 1/10 to 10 time units.
  • Other results are available. Use names(result) to see their list. Except in the the special case where distname.y is "exponential" and where no historical data are used, the inference on quantiles is obtained with the delta method and using numerical derivatives. Confidence limits are unreliable for return levels much greater than the observation-historical period. Due to the presence of estimated parameters, the Kolmogorov-Smirnov test is unreliable when less than 30 observations are available.

Warning

With some distributions or in presence of historical data, the estimation can fail due to some problem during the optimization or the determination of the (numerical) hessian can be impossible. This can happen if one or more parameter is too small to compute a finite difference approximation of gradient. For instance the 'rate' parameter of the exponential distribution (= inverse mean) will be small when the mean of exceedances is large.

A possible solution is then to rescale the data e.g. dividing them by 10 or 100. As a rule of thumb, an acceptable scaling leads to data (exceedances) of a few units to a few hundreds, but an order of magnitude of thousands or more should be avoided and reduced by scaling. The rescaling is recommanded for the square exponential distribution (obtained with trans = "square") since the observations are squared.

Another possible way to solve the problem is to change the numDeriv value.

This problem will be solved in a future version.

Details

Some distributions are considered in a special manner: "exponential", "weibull" and "gpd" (possible only through the use of the evd package). For these distributions, it is not necessary to give starting values nor parameter names which are unambiguous. They will be called "special" distributions. Other distributions are possible, such as "gamma". Since the probability functions are then used in a "black-box" fashion, these distributions should respect the following formal requirements:
  1. The name for thedensity,distributionandquantilefunctions must obey to theclassical "prefixing convention". Prefixes must be respectively:"d","p","q". This rules applies for distribution of thestatspackage and those of many other packages suchevd.
  2. First (main) argument must be vectorisablein all three functions, i.e. a vector ofx,qorpmust be accepted by the density, the distribution and the quantile functions.
  3. The density must have alogformalargument. WhenlogisTRUE, the log-density is returned instead of the density.

For such a distribution it is necessary to give arguments names in start.par.y. The arguments list must have exactly the required number of parameters for the family (e.g. 2 for gamma). In future version, some parameters will be allowed to be fixed (known) and the parameter set will be the reunion of those in start.par.y and those in fixed.par.y. Anyway, at least one parameter must be unknown for the y part of model. Mathematical requirements exist for a correct use of ML. They are refered to as "regularity conditions" in ML theory. Note that support of the distribution must be the set of non-negative real numbers. The estimation procedure differs according to the existence of historical (MAX) data.

  1. When no historical data is given, the whole set of parameters contains orthogonal subsets: a "point process" part concerning the process of events, and an "observation" part concerning the threshold exceedances part. The parameters can in this case be estimated separately. The rate of the Poisson process is estimated by the empirical rate, i.e. the number of events divided by the total duration given insumw.BOT. The "Over the Threshold" parameters are estimated from the exceedances computed asx.OTminus the threshold.
  2. When historical data is given, the two parameter vectors must be coped with together in maximising the global likelihood. In this case, we begin the estimation ignoring the historical data and then use the estimates as starting values for the maximisation of the global likelihood. In some circumstances, the estimates obtained in the first stage can not be used with historical data because some of these fall outside the support of the distribution fitted. This can happen e.g. with adistname.y = "gpd"when historical data exceedthreshold-scale/shapefor the values ofshapeandscalecomputed in the first stage.

References

Miquel J.(1984) Guide pratique d'estimation des probabilites de crues, Eyrolles (coll. EDF DER).

See Also

rRenouv to simulate "renouvellement" data, RLplot for the return level plot. See optim for the tuning of optimisation.

Examples

Run this code
## artificial data
test2 <- rRenouv(nb = 100,
                 threshold = 40,
                 par.N = list(lambda = 2),
                 densfun.y = "weibull",
                 par.y = mom2par("weibull", mean = 30, sd = 36))  

## exponential exceedances/no historical data/wrong threshold
fit.expon <- fRenouv(x.OT = test2$x, 
                     distname.y = "exponential",
                     threshold = 50,
                     conf.pct = c(70, 95, 99),
                     main = ""exponential" distr. (special)")

## The same but with regarding exponential distribution
## as an arbitrary (not in the listo of special distr.)
fit.exp <- fRenouv(x.OT = test2$x, 
                  distname.y = "exp",
                  start.par.y = list(rate = 1),
                  threshold = 50,
                  main = ""exp" distr. (non-special)")

## The same but with another non-target distribution 
## (thus requiering inital values in start.par.y) 
fit.gamma <- fRenouv(x.OT = test2$x,
                     distname.y = "gamma",
                     threshold = 50,
                     main = ""gamma" distr. (non-special)")


## Weibull exceedances/no historical data/true threshold
fit.weibull <- fRenouv(x.OT = test2$x,
                       distname.y = "weibull",
                       threshold = 50,
                       main = ""weibull" distr. (special). NO history")

## Add historical data 400 "years"
fit.weibull2  <-
   fRenouv(x.OT = test2$x,
           z.H = 300, block.H = 1, w.BH = 400,  ## HISTORY
           distname.y = "weibull",
           threshold = 40,
           main = ""weibull" distr. (special). History: 1 block of 400 years")
## comparison...
comp <- cbind(c(fit.weibull$est.N, fit.weibull$est.y), fit.weibull2$estimate)
colnames(comp) <- c("No hist.", "With hist.")
print(comp)

## gpd exeedances, like POT, but with history
require(evd)
fit.gpd <-
    fRenouv(x.OT = test2$x, 
            z.H = 300, block.H = 1, w.BH = 400,     ## HISTORY
            distname.y = "gpd",
            threshold = 40,
            main = "\"gpd\" distr. (special). History: 1 block of 400 years")

## Coles'book  rain example. Blocks are guessed and might not be exact
## due to missing values

library(ismev)
data(rain)
  
block <- 1:length(rain)
block <- 1914 + floor(block/365)
  
fit.rain <- fRenouv(x.OT = rain, 
                  distname.y = "gpd",
                  threshold = 30,
                  main = "\"gpd\" distr. (target) on ismev 'rain' dataset")
  
fit.rain.evd <- fpot(x = rain, threshold = 30, model ="gpd")

Run the code above in your browser using DataLab