regtst: Test statistics for regional frequency analysis

Description

Computes discordancy, heterogeneity and goodness-of-fit measures for regional frequency analysis. These are the statistics $D_i$, $H$, and $Z^{\rm DIST}$ defined respectively in sections 3.2.3, 4.3.3, and 5.2.3 of Hosking and Wallis (1997).

Usage

regtst(regdata, nsim=1000)

Arguments

regdata

Object of class regdata containing the input data. It should be a data frame, each of whose rows contains data for one site. The first seven columns should contain respectively the site name,

nsim

Number of simulations to use in the calculation of the heterogeneity and goodness-of-fit measures. If less than 2, only the discordancy measure will be calculated.

Value

An object of class "regtst", which is a list with elements as follows.
dataThe input data frame regdat.
nsimNumber of simulations, i.e. the argument nsim.
DVector containing the discordancy measures for each site.
DcritVector of length 2 containing critical values of the discordancy measure corresponding to significance levels of 10 and 5 per cent --- except that the values never exceed 3 and 4 respectively. See Hosking and Wallis (1997), section 3.2.4.
rmomVector of length 5 containing the regional weighted average $L$-moment ratios (weights proportional to record lengths).
rparaVector of length 4 containing the parameters of a kappa distribution fitted to the regional weighted average $L$-moment ratios.
vobsVector of length 3 containing the observed values of the three measures of between-site dispersion of $L$-moment ratios.
vbarVector of length 3 containing the mean of the simulated values of the three dispersion measures.
vsdVector of length 3 containing the standard deviation of the simulated values of the three dispersion measures.
HVector of length 3 containing the three measures of regional heterogeneity.
paraList of length 6 containing the parameters of the five candidate distributions and the Wakeby distribution (3-letter abbreviation "wak") fitted to the regional weighted average $L$-moment ratios.
t4fitVector of length 5 containing the $L$-kurtosis of the five candidate distributions fitted to the regional weighted average $L$-moment ratios.
ZVector of length 5 containing the goodness-of-fit measures for each of the five candidate distributions.

synopsis

regtst(regdata, nsim=1000) print.regtst(x, ...)

Details

The discordancy measure $D_i$ indicates, for site $i$, the discordancy between the site's $L$-moment ratios and the (unweighted) regional average $L$-moment ratios. Large values might be used as a flag to indicate potential errors in the data at the site. Large might be 3 for regions with 15 or more sites, but less (exact values in list element Dcrit) for smaller regions. Three heterogeneity measures are calculated, each based on a different measure of between-site dispersion of $L$-moment ratios: [1] weighted standard deviation of $L$-CVs; [2] average of $L$-CV/$L$-skew distances; [3] average of $L$-skew/$L$-kurtosis distances. These dispersion measures are the quantities $V$, $V_2$, and $V_3$ defined respectively in equations (4.4), (4.6), and (4.7) of Hosking and Wallis (1997). The heterogeneity measures are calculated from them as in equation (4.5) of Hosking and Wallis (1997). In practice H[1] is probably sufficient. A value greater than (say) 1.0 suggests that further subdivision of the region should be considered as it might improve the accuracy of quantile estimates. Goodness of fit is evaluated for five candidate distributions: generalized logistic, generalized extreme value, generalized normal (lognormal), Pearson type III (3-parameter gamma), and generalized Pareto. In the output the distributions are referred to by 3-letter abbreviations, respectively glo, gev, gno, pe3, and gpa. If the region is homogeneous and data at different sites are statistically independent, then if one of the distributions is the true distribution for the region its goodness-of-fit measure should have approximately a standard normal distribution. Provided that the region is acceptably close to homogeneous, the fit may be judged acceptable at the 10 per cent significance level if the $Z$ value is less than 1.645 (i.e., qnorm(0.95)) in absolute value. Calculation of heterogeneity and goodness-of-fit measures involves the sampling variability of $L$-moment ratios in a homogeneous region whose record lengths and average $L$-moment ratios match those of the data. The sampling variability is estimated by Monte Carlo simulation using nsim replications of the region. Results will vary between invocations of regtst with different seeds for the random-number generator.

References

Hosking, J. R. M. (1996). Fortran routines for use with the method of $L$-moments, Version 3. Research Report RC20525, IBM Research Division, Yorktown Heights, N.Y. Hosking, J. R. M., and Wallis, J. R. (1997). Regional frequency analysis: an approach based on $L$-moments. Cambridge University Press.

Examples

Run this code

# An example from Hosking (1996).  Compare the output with
# the file 'cascades.out' in the LMOMENTS Fortran package at
# http://lib.stat.cmu.edu/lmoments/general (results will not
# be identical, because random-number generators are different).
data(Cascades)
summary(regtst(Cascades, nsim=500))

# Output from 'regsamlmu' can be fed straight into 'regtst'
data(Maxwind)
regtst(regsamlmu(Maxwind))

Run the code above in your browser using DataLab