Learn R Programming

energyGOF (version 0.1)

energyGOF-package: energyGOF: Goodness-of-Fit Tests via the Energy of Data

Description

Conduct one- and two-sample goodness-of-fit tests for univariate data. In the one-sample case, normal, uniform, exponential, Bernoulli, binomial, geometric, beta, Poisson, lognormal, Laplace, asymmetric Laplace, inverse Gaussian, half-normal, chi-squared, gamma, F, Weibull, Cauchy, and Pareto distributions are supported. egof.test() can also test goodness-of-fit to any distribution with a continuous distribution function. A subset of the available distributions can be tested for the composite goodness-of-fit hypothesis, that is, one can test for distribution fit with unknown parameters. P-values are calculated via parametric bootstrap.

Arguments

Getting Started

The main entry point is energyGOF.test(). The only documentation you need to read is energyGOF.test() and energyGOF-package.

Here is a simple example to get you going

x <- rnorm(10)

## Composite energy goodness-of-fit test (test for Normality with unknown ## parameters)

energyGOF.test(x, "normal", nsim = 1e5)

## Simple energy goodness-of-fit test (test for Normality with known ## parameters). egof.test is an alias for energyGOF.test.

egof.test(x, "normal", nsim = 1e5, mean = 0, sd = 1)

## Two-sample test y <- rt(10, 1) egof.test(x, y, nsim = 1e5)

## Test agaist any distribution function by transforming data to uniform egof.test(y, pt, nsim = 1e5)

You may alternatively use the energyGOFdist() function, which is a different interface using S3 objects, but it provides the same result. There is a lot of documentation in this package for the various S3 constructors that are needed by energyGOFdist(), BUT if you just want to do some testing and use the standard interface, you can probably ignore all of that and just read the page for energyGOF.test().

Distributions Supported

The following distributions are supported.

DistributionFunctionParametersComposite_Test
Asymmetric Laplacealaplace_distlocation, scale, skewTRUE
Asymmetric Laplaceasymmetric_laplace_distlocation, scale, skewTRUE
Bernoullibernoulli_distprobFALSE
Betabeta_distshape1, shape2TRUE
Binomialbinomial_distsize, probFALSE
Cauchycauchy_distlocation, scale, powTRUE
Chi-Squaredchisq_distdfFALSE
Exponentialexp_distrateTRUE
Exponentialexponential_distrateTRUE
Ff_distdf1, df2FALSE
Gammagamma_distshape, rateFALSE
Geometricgeometric_distprobFALSE
Half-Normalhalfnormal_distscaleTRUE
Inverse Gaussianinverse_gaussian_distmean, shapeTRUE
Inverse Gaussianinvgauss_distmean, shapeTRUE
Laplacelaplace_distlocation, scaleTRUE
Lognormallognormal_distmeanlog, sdlogTRUE
Normalnormal_distmean, sdTRUE
Pareto (Type I)pareto_distscale, shape, powTRUE
Poissonpoisson_distlambdaTRUE
Uniformuniform_distmin, maxFALSE
Weibullweibull_distshape, scaleTRUE

Simple and Composite Testing

There are two types of goodness-of-fit tests covered by the energyGOF package, simple and composite. It's important to know the difference because they yield different results. Simple GOF tests test the data x against a specific distribution with known parameters that you must pass to energyGOF.test in the ellipsis argument (...). You should use a simple GOF test if you wish to test questions like "my data are Normal with mean 1 and sd 2". energyGOF() can also conduct some composite GOF tests. A composite test is performed if no parameters are passed in the ellipsis argument (...). You should conduct a composite test if your research question is "my data are Normal, but I don't know what the parameters are." Obviously, this composite question is much more common in practice.

All the composite tests in energyGOF assume that none of the parameters are known. So while there is a statistical test of Normality with known mean and unknown sd, this is not implemented in the energyGOF package. So, either pass all the distribution parameters or none of them. (In the special case of the Normal distribution, you can use the energy::energy package to test the GOF hypothesis with any combination of known and unknown parameters.)

For each test, energyGOF.test() calculates the test statistic and a p-value. In all cases the p-value is calculated via parametric bootstrap. For large nsim, the p-values should be reasonably honest in small-ish samples. You may need to perform a sensitivity study to find a reasonable nsim for your particular testing problem.

Power Analyses

Please see the repository https://github.com/jthaman/energyGOF-power for examples of how to conduct power analyses with energyGOF, and for preliminary performance data agaist alternative methods.

About Energy

Székely, G. J., & Rizzo, M. L. (2023) provide the motivation:

"Data energy is a real number (typically a non-negative number) that depends on the distances between data. This concept is based on the notion of Newton’s gravitational potential energy, which is also a function of the distance between bodies. The idea of data energy or energy statistics is to consider statistical observations (data) as heavenly bodies governed by the potential energy of data, which is zero if and only if an underlying statistical hypothesis is true."

The notation \(X'\) indicates that \(X'\) is an independent and identically distributed copy of \(X\).

If \(X\) and \(Y\) are independent and \(E(|X|^s + |Y|^s)\) is finite, then for \(0 < s < 2\),

$$2E|X-Y|^s - E|X-X'|^s - E|Y-Y'|^s \ge 0.$$

Equality is attained if and only if \(X\) and \(Y\) are identically distributed. The left side of the equation is the energy between \(X\) and \(Y\). Energy can be generalized to multivariate data and even more exotic data types, but in this R package, we only treat univariate data.

The concept of data energy between two random variables can be adapted to the one-sample goodness-of-fit problem. The one-sample \(s\)-energy is

$$E^* = \frac{2}{n} \sum_i E|x_i - Y|^s - E|Y-Y'|^s - \frac{1}{n^2} \sum_i \sum_j |x_i - x_j|^s,$$

when \(0 < s < 2\) and \(E|X|^s, E|Y|^s < \infty.\)

In most tests in the energyGOF package \(s = 1\). In some cases (Pareto and Cauchy), \(E|Y|\) is not finite, so we need to use an \(s < 1\). This is done by passing pow into ... (but in all tests a default pow is provided). These tests are called generalized energy goodness-of-fit tests in this package as well as in Székely, G. J., & Rizzo, M. L. (2023).

To connect energy back to GOF testing, in the one-sample goodness-of-fit regime, we test if a sample \(x_1, \ldots, x_n \sim X\) (where the distribution of \(X\) is hidden) follows the same distribution as \(Y\), which is specified. If \(X\) and \(Y\) have the same distribution, then the distribution of \(Q = nE^*\) is a quadratic form of centered Gaussian random variables with expected value \(E|Y-Y'|^s\). If \(X\) and \(Y\) differ, then \(Q \to \infty\) with \(n\). So, \(Q\) provides a consistent goodness-of-fit test, even in some situations where \(E|Y|\) is not finite. Asymptotic theory of V-statistics can be applied to prove that tests based on \(Q\) are statistically consistent goodness-of-fit tests.

Author

John T. Haman

References

Székely, G. J., & Rizzo, M. L. (2023). The energy of data and distance correlation. Chapman and Hall/CRC.

Székely, G. J., & Rizzo, M. L. (2013). Energy statistics: A class of statistics based on distances. Journal of statistical planning and inference, 143(8), 1249-1272.

Li, Y. (2015). Goodness-of-fit tests for Dirichlet distributions with applications. Bowling Green State University.

Rizzo, M. L. (2002). A new rotation invariant goodness-of-fit test (PhD thesis). Bowling Green State University

Haman, J. T. (2018). The energy goodness-of-fit test and EM type estimator for asymmetric Laplace distributions (Doctoral dissertation, Bowling Green State University).

Ofosuhene, P. (2020). The energy goodness-of-fit test for the inverse Gaussian distribution (Doctoral dissertation, Bowling Green State University).

Rizzo, M. L. (2009). New goodness-of-fit tests for Pareto distributions. ASTIN Bulletin: The Journal of the IAA, 39(2), 691-715.

Yang, G. (2012). The Energy Goodness-of-fit Test for Univariate Stable Distributions (Doctoral dissertation, Bowling Green State University).

See Also