powerdiverger: Power-Divergence Statistic

Description

Calculates the power-divergence statistic and gives the p-value for testing if at least one category has a different proportion than the others.

Usage

powerdiverger(x, y = NULL, p = NULL, lambda = c(-2, -1, -1/2, 0, 2/3, 1),
    alternative = c("two.sided", "less", "greater"), df = NULL,
    conf.level = 0.95, correct = FALSE )

Value

A list object of class “htest” if the length of lambda is 1. Otherwise, a list of “htest” objects of length equal to the length of lambda. The “htest” list has components:

statistic: The power-divergence statistic.
parameter: the degrees of freedom of the approximate chi-squared distribution of the test statistic.
p.value: The estimated p-value of the test.
estimate: a vector with the sample proportions x/n.
null.value: the value of p if specified by the null, or NULL otherwise.
conf.int: NULL
alternative: a character string describing the alternative.
method: character naming the specific test if one of the ones described above in the details section.
data.name: a character string giving the names of the data.

If the first-order moment correction is applied then two additional values are returned:

mu.lambda, sigma.lambda: The first-order moment correction terms.

Arguments

x: a numeric vector or matrix, can be factors.
y: a numeric vector; ignored if x is a matrix. If x is a factor, y should be a factor of the same length.
p: a vector of probabilities of success. The length of p must be the same as the number of groups specified by x, and its elements must be greater than 0 and less than 1.
lambda: User-chosen parameter that defines which statistic is represented by the power-divergence family (see details below).
alternative: a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less".
df: The degrees of freedom for the model. Default is the number of categories less one if p is also NULL and the number of categories otherwise.
conf.level: Not used.
correct: logical, if TRUE the first-order moment correction is applied to the statistic. Provides more accurate results for smaller samples.

Author

Eric Gilleland

Details

User-chosen parameter that defines which statistic is represented by the power-divergence family. Asymptotically, they are all the same and follow a chi-square distribution with degrees of freedom equal to one less than the number of categories.

Values of 0 and -1 are defined by continuity and are equal to the likelihood-ratio (Neyman 1949) and Kullback-Leibler (Kullback and Leibler 1951) statistics, respectively. The Pearson chi-square statistic results from lambda = 1 (Pearson 1900). If lambda = -2, then it is the Neyman modification of the Pearson chi-square (Neyman 1949). Other named statistics that can be attained are the Freeman-Tukey statistic (lambda = -0.5, Freeman and Tukey 1950) and the Cressie-Read statistic (lambda = 2/3, Cressie and Read 1984).

Note that no continuity correction is (yet) available, which is important for small samples.

For more information about this statistic see Cressie and Read (1984), Read and Cressie (1988) or the appendix of Gilleland et al. (2023) for a concise description, but note that the power-divergence family of statistics is twice that of the power-divergence measure given in Eq (A1).

A print method function is available.

References

Cressie, N., and T. R. C. Read (1984). Multinomial goodness-of-fit tests. J. Roy. Stat. Soc., 46, 440--464.

Freeman, M. F., and J. W. Tukey (1950). Transformations related to the angular and the square root. Ann. Math. Stat., 21, 607--611, doi: 10.1214/aoms/1177729756.

Gilleland, E., D. Munoz-Esparza, and D. D. Turner (2023). Competing forecast verification: Using the power-divergence statistic for testing the frequency of better.

Kullback, S., and R. A. Leibler (1951). On information and sufficiency. Ann. Math. Stat., 22, 79--86, doi: 10.1214/aoms/1177729694.

Neyman, J. (1949). Contribution to the theory of the x2 test. Proc. First Berkeley Symp. on Mathematical Statistics and Probability, Berkeley, CA, University of California, 239--273.

Pearson, K. (1900). On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philos. Mag., 50, 157--175, doi: 10.1080/14786440009463897.

Read, T. R. C., and N. A. C. Cressie (1988). Goodness-of-Fit Statistics for Discrete Multivariate Data. 1st ed. Springer-Verlag, New York, 212 pp.

Examples

Run this code


## Table 4.1 of Read and Cressie (1988).
# Goodness-of-fit test
dograce <- data.frame( dog = 1:8, 
                       obs = c( 104, 95, 66, 63, 62, 58, 60, 87 ),
		       mod = rep( 74.375, 8 ) )

(res <- powerdiverger( x = dograce$obs, p = dograce$mod/(8*74.375) ) )

# Chi-square test.
res$results[[6]]
# cf. with 'chisq.test'
chisq.test( x = dograce$obs, p = dograce$mod/(8*74.375), correct = FALSE )

# Test for independence (contingency table).
# From 'chisq.test' help file
M <- as.table(rbind(c(762, 327, 468), c(484, 239, 477)))
dimnames(M) <- list(gender = c("F", "M"),
                    party = c("Democrat","Independent", "Republican"))
		    (Xsq <- chisq.test(M))
(powerdiverger( M ))

# cf. with
(chisq.test(M))

Run the code above in your browser using DataLab