homogeneity_test_pv: Conditional Two-Sample Homogeneity Test for Binomial Experiments

Description

Performs an exact or approximate conditional test about the homogeneity of two binomial samples, i.e. regarding the respective probabilities of success. It is vectorised, only calculates p-values and offers a normal approximation of their computation. Furthermore, it is capable of returning the discrete p-value supports, i.e. all observable p-values under a null hypothesis. Multiple tests can be evaluated simultaneously. In two-sided tests, several procedures of obtaining the respective p-values are implemented.

Usage

homogeneity_test_pv(
  x,
  n,
  alternative = "two.sided",
  ts_method = "minlike",
  exact = TRUE,
  correct = TRUE,
  simple_output = FALSE
)

Value

If simple.output = TRUE, a vector of computed p-values is returned. Otherwise, the output is a DiscreteTestResults R6 class object, which also includes the p-value supports and testing parameters. These have to be accessed by public methods, e.g. $get_pvalues().

Arguments

x: integer vector with two elements or a matrix with two columns or a data frame with two columns giving the number of successes for the two experiments.
n: integer vector with two elements or a matrix with two columns or a data frame with two columns giving the number of trials for the two experiments.
alternative: character vector that indicates the alternative hypotheses; each value must be one of "two.sided" (the default), "less" or "greater".
ts_method,: single character string that indicates the two-sided p-value computation method (if any value in alternative equals "two.sided") and must be one of "minlike" (the default), "blaker", "absdist" or "central" (see details). Ignored, if exact = FALSE.
exact: logical value that indicates whether p-values are to be calculated by exact computation (TRUE; the default) or by a continuous approximation (FALSE).
correct: logical value that indicates if a continuity correction is to be applied (TRUE; the default) or not (FALSE). Ignored, if exact = TRUE.
simple_output,: logical value that indicates whether an R6 class object, including the tests' parameters and support sets, i.e. all observable p-values under each null hypothesis, is to be returned (see below).

Details

The parameters x, n and alternative are vectorised. They are replicated automatically, such that the number of x's rows is the same as the length of alternative. This allows multiple null hypotheses to be tested simultaneously. Since x and n are coerced to matrices (if necessary) with two columns, they are replicated row-wise.

It can be shown that this test is a special case of Fisher's exact test, because it is conditional on the numbers of trials and the sums of successes and failures. Therefore, its computations are handled by fisher_test_pv().

For exact computation, various procedures of determining two-sided p-values are implemented.

"minlike": The standard approach in stats::fisher.test() and stats::binom.test(). The probabilities of the likelihoods that are equal or less than the observed one are summed up. In Hirji (2006), it is referred to as the Probability-based approach.
"blaker": The minima of the observations' lower and upper tail probabilities are combined with the opposite tail not greater than these minima. More details can be found in Blaker (2000) or Hirji (2006), where it is referred to as the Combined Tails method.
"absdist": The probabilities of the absolute distances from the expected value that are greater than or equal to the observed one are summed up. In Hirji (2006), it is referred to as the Distance from Center approach.
"central": The smaller values of the observations' simply doubles the minimum of lower and upper tail probabilities. In Hirji (2006), it is referred to as the Twice the Smaller Tail method.

For non-exact (i.e. continuous approximation) approaches, ts_method is ignored, since all its methods would yield the same p-values. More specifically, they all converge to the doubling approach as in ts_mthod = "central".

References

Fisher, R. A. (1935). The logic of inductive inference. Journal of the Royal Statistical Society Series A, 98, pp. 39–54. tools:::Rd_expr_doi("10.2307/2342435")

Agresti, A. (2002). Categorical data analysis. Second Edition. New York: John Wiley & Sons. pp. 91–97. tools:::Rd_expr_doi("10.1002/0471249688")

Blaker, H. (2000) Confidence curves and improved exact confidence intervals for discrete distributions. Canadian Journal of Statistics, 28(4), pp. 783-798. tools:::Rd_expr_doi("10.2307/3315916")

Hirji, K. F. (2006). Exact analysis of discrete data. New York: Chapman and Hall/CRC. pp. 55-83. tools:::Rd_expr_doi("10.1201/9781420036190")

Examples

Run this code

# Constructing
set.seed(3)
p1 <- c(0.25, 0.5, 0.75)
p2 <- c(0.15, 0.5, 0.60)
n1 <- c(10, 20, 50)
n2 <- c(25, 75, 200)
x1 <- rbinom(3, n1, p1)
x2 <- rbinom(3, n2, p2)
x  <- cbind(x1 = x1, x2 = x2)
n  <- cbind(n1 = n1, n2 = n2)

# Exact two-sided p-values ("blaker") and their supports
results_ex <- homogeneity_test_pv(x, n, ts_method = "blaker")
print(results_ex)
results_ex$get_pvalues()
results_ex$get_pvalue_supports()

# Normal-approximated one-sided p-values ("less") and their supports
results_ap <- homogeneity_test_pv(x, n, "less", exact = FALSE)
print(results_ap)
results_ap$get_pvalues()
results_ap$get_pvalue_supports()

Run the code above in your browser using DataLab