paraep4: Estimate the Parameters of the 4-p Asymmetric Exponential Power Distribution

Description

This function estimates the parameters of the 4-parameter Asymmetric Exponential Power distribution given the L-moments of the data in an L-moment object such as that returned by lmoms. The relation between distribution parameters and L-moments is seen under lmomaep4. Relatively straightforward, but difficult to numerically achieve, optimization is needed to extract the parameters from the L-moments.

Delicado and Goria (2008) make argue for numerical methods to use the following objective function

$$\epsilon(\alpha, \kappa, h) = \log(1 + \sum_{r=2}^4 (\hat\lambda_r - \lambda_r)^2)$$

and subsequently solve directly for $\xi$.

This objective function was chosen by Delicado and Goria because the solution surface can become quite flat for away from the minimum. The author of lmomco agrees with the findings of those authors from limited exploratory analysis and the development of the algorithms used here under the rubic of the DG method. This exploration resulted in an alternative algorithm using tabulated initial guesses described below. An evident drawback of the Delicado-Goria algorithm, is that precision in $\alpha$ is may be lost according to the observation that this parameter can be analytically computed given $\lambda_2$, $\kappa$, and $h$.

It is established practice in L-moment theory of four (and similarly three) parameter distributions to see expressions for $\tau_3$ and $\tau_4$ used for numerical optimization to obtain the two higher parameters ($\alpha$ and $h$) first and then see analytical expressions directly compute the two lower parameters ($\xi$ and $\alpha$). The author made various exploratory studies by optimizing on $\tau_3$ and $\tau_4$ through a least squares objective function. Such a practice seems to perform acceptably when compared to that recommended by Delicado and Goria (2008) when the initial guesses for the parameters are drawn from pretabulation of the relation between ${\alpha, h}$ and ${\tau_3, \tau_4}$.

Another optimization, referred to here as the A method, is available for parameter estimation using the following objective function

$$\epsilon(\kappa, h) = \sqrt{(\hat\tau_3 - \tau_3)^2 + (\hat\tau_4 - \tau_4)^2}$$

and subsequently solve directly for $\alpha$ and then $\xi$. The A method appears to perform slightly better in $\kappa$ and $h$ estimation and quite a bit better in $\alpha$ and and $\xi$ as seemingly expected because these last two are analytically computed. The objective function of the A method defaults to use of the $\sqrt{x}$ but this can be removed by setting sqrt.t3t4=FALSE.

The initial guesses for the $\kappa$ and $h$ parameters derives from a hashed environment in in file sysdata.rda (.lmomcohash$AEPkh2lmrTable) in which the ${\kappa, h}$ pair having the minimum $\epsilon(\kappa, h)$ in which $\tau_3$ and $\tau_4$ derive from the table as well. The file SysDataBuilder.R provides additional technical details on how the AEPkh2lmrTable was generated.

The table represents a systematic double-loop sweep through lmomaep4 for

$$-3 \le \log(\kappa) \le 3, \Delta=0.05$$

and

$$-3 \le \log(h) \le 3, \Delta=0.05$$

The function will not return parameters if the following lower bounds of $\tau_4$ is not met: $\tau_4 \ge 0.77555(|\tau_3|) - 3.3355(|\tau_3|)^2 + 14.196(|\tau_3|)^3 - 29.909(|\tau_3|)^4 + 37.214(|\tau_3|)^5 - 24.741(|\tau_3|)^6 + 6.7998(|\tau_3|)^7$. For this polynomial, the residual standard error is RSE = 0.0003125 and the maximum absolute error for $\tau_3{:}[0,1] < 0.0015$. The actual coefficients in paraep4 have additional significant figures.

Usage

paraep4(lmom, checklmom=TRUE,
        method=c("A", "DG", "ADG"),
        sqrt.t3t4=TRUE, eps=1e-4,
        checkbounds=TRUE, kapapproved=TRUE,
        A.guess=NULL, K.guess=NULL, H.guess=NULL)

Arguments

lmom

A L-moment object created by lmoms or pwm2lmom.

checklmom

Should the L-moments be checked for validity using the are.lmom.valid function. Normally this should be left as the default and it is very unlikely that the L-moments will not be viable (particularly in the $\tau_4$ and $\tau_3$ inequality).

method

Which method for parameter estimation should be used. The A or DG methods. The ADG method will run both methods and retains the salient optimization results of each but the official parameters in pa

sqrt.t3t4

If true and the method is A, then the square root of the sum of square errors in $\tau_3$ and $\tau_4$ are used instead of sum of square differences alone.

eps

A small term or threshold for which the square root of the sum of square errors in $\tau_3$ and $\tau_4$ is compared to to judge good enough for the alogrithm to set the ifail on return in addition to convergence flags coming

checkbounds

Should the lower bounds of $\tau_4$ be verified and if sample $\hat\tau_3$ and $\hat\tau_4$ are outside of these bounds, then NA are returned for the solutions.

kapapproved

Should the Kappa distribution be fit by parkap if $\hat\tau_4$ is below the lower bounds of $\tau_4$? This fitting is only possible if checkbounds is true. The Kappa and AEP4 overlap partially.

A.guess

A user specified guess of the $\alpha$ parameter to provide to the optimization of any of the methods. This argument just superceeds the simple initial guess of $\alpha = 1$.

K.guess

A user specified guess of the $\kappa$ parameter to supercede that derived from the .lmomcohash$AEPkh2lmrTable in file sysdata.rda.

H.guess

A user specified guess of the $h$ parameter to supercede that derived from the .lmomcohash$AEPkh2lmrTable in file sysdata.rda.

Value

An Rlist is returned.
typeThe type of distribution: aep4.
paraThe parameters of the distribution.
sourceThe source of the parameters: paraep4.
ifailA numeric failure code.
ifailtextA text message for the failure code.
methodThe method as specified by the method.
L234Optional and dependent on method DG or ADG. Another Rlist containing the optimization details by the DG method along with the estimated parameters in para_L234. The _234 is to signify that optimization is being conducted using $\lambda_2$, $\lambda_3$, and $\lambda_4$. The parameter values in para are those only when the DG method is used.
T34Optional and dependent on method A or ADG. Another Rlist containing the optimization details by the A method along with the estimated parameters in para_T34. The _T34 is to signify that opimization is being conducted using $\tau_3$ and $\tau_4$ only. The parameter values in para are those by the A method.
The values for ifail or produced by three mechanisms. First, the convergence number emanating from the optim function itself. Second, the integer 1 is used when the failure is attributable to the optim function. Second, the interger 2 is a general attempt to have a singular failure by sometype of eps outside of optim. Fourth, the integer 3 is used to show that the parameters fail against a parameter validity check in are.paraep4.valid. And fifth, the integer 4 is used to show that the sample L-moments are below the lower bounds of the $\tau_4$ polynomial shown here.
Additional and self explanatory elements on the returned list will be present if the Kappa distribution was fit instead.

References

Delicado, P., and Goria, M.N., 2008, A small sample comparison of maximum likelihood, moments and L-moments methods for the asymmetric exponential power distribution: Computational Statistics and Data Analysis, v. 52, no. 3, pp. 1661-1673.

Asquith, W.H., 2012, Parameter Estimation for the 4-Parameter Asymmetric Exponential Power Distribution by the Method of L-moments using R, A Journal Article in Progress.

Examples

Run this code

# As of version 1.6.2, it is felt that in spirit of CRAN CPU
# reduction that the intensive operations of paraep4() should
# be kept a bay. However, the following examples are useful.
# WHA would like these to be turned on, but this Rd file consumes
# about 50 percent of the CPU cycles of R CMD check.
PAR <- list(para=c(100, 1000, 1.7, 1.4), type="aep4");
lmr <- lmomaep4(PAR)
aep4 <- paraep4(lmr, method="ADG")
print(aep4)

PARdg  <- paraep4(lmr, method="DG")
PARasq <- paraep4(lmr, method="A")

print(PARdg)
print(PARasq)

F <- c(0.001, 0.005, seq(0.01,0.99, by=0.01), 0.995, 0.999)
qF <- qnorm(F)
ylim <- range( quaaep4(F, PAR), quaaep4(F, PARdg), quaaep4(F, PARasq) )
plot(qF, quaaep4(F, PARdg), type="n", ylim=ylim,
     xlab="STANDARD NORMAL VARIATE", ylab="QUANTILE")
lines(qF, quaaep4(F, PAR), col=8, lwd=10) # the true curve
lines(qF, quaaep4(F, PARdg),  col=2, lwd=3)
lines(qF, quaaep4(F, PARasq), col=3, lwd=2, lty=2)
# See how the red curve deviates, Delicado-Goria failed
# and the ifail attribute in PARdg is TRUE.

print(PAR$para)
print(PARdg$para)
print(PARasq$para)

ePAR1dg <- abs((PAR$para[1] - PARdg$para[1])/PAR$para[1])
ePAR2dg <- abs((PAR$para[2] - PARdg$para[2])/PAR$para[2])
ePAR3dg <- abs((PAR$para[3] - PARdg$para[3])/PAR$para[3])
ePAR4dg <- abs((PAR$para[4] - PARdg$para[4])/PAR$para[4])

ePAR1asq <- abs((PAR$para[1] - PARasq$para[1])/PAR$para[1])
ePAR2asq <- abs((PAR$para[2] - PARasq$para[2])/PAR$para[2])
ePAR3asq <- abs((PAR$para[3] - PARasq$para[3])/PAR$para[3])
ePAR4asq <- abs((PAR$para[4] - PARasq$para[4])/PAR$para[4])

MADdg  <- mean(ePAR1dg,  ePAR2dg,  ePAR3dg,  ePAR4dg)
MADasq <- mean(ePAR1asq, ePAR2asq, ePAR3asq, ePAR4asq)

# We see that the Asquith method performs better for the example
# parameters in PAR and inspection of the graphic will show that
# the Delicado-Goria solution is obviously off.
print(MADdg)
print(MADasq)

# Repeat the above with this change in parameter to
# PAR <- list(para=c(100, 1000, .7, 1.4), type="aep4")
# and the user will see that all three methods converged on the
# correct values.

Run the code above in your browser using DataLab