Learn R Programming

hint (version 0.1-0)

Hyperintersection: The Hypergeometric Intersection Family of Distributions

Description

Density, distribution function, quantile function and random generation for the hypergeometric intersection distribution.

Usage

dhint(n, a, b, q = 0, range = NULL, log = FALSE, verbose = TRUE)
phint(n, a, b, q = 0, vals, upper.tail = TRUE, log.p = FALSE)
qhint(p, n, a, b, q = 0, upper.tail = TRUE, log.p = FALSE)
rhint(num = 5, n, a, b, q = 0)

Arguments

n
An integer specifying the number of categories in the urns.
a
An integer specifying the number of balls drawn from the first urn.
b
An integer specifying the number of balls drawn from the second urn.
q
An integer specifying the number of categories in the second urn which have duplicate members. If q is 0 (default) then the symmetrical, singleton case is computed, otherwise the asymmetrical, duplicates case is computed (see Details).
p
A probability between 0 and 1.
num
An integer specifying the number of random numbers to generate. Defaults to 5.
range, vals
A vector of integers specifying the intersection sizes for which probabilities (dhint) or cumulative probabilites (phint) should be computed (can be a single number). If range is NULL (default) then probabilities wil
log, log.p
Logical. If TRUE, probabilities p are given as log(p). Defaults to FALSE.
verbose
Logical. If TRUE, progress of calculation in the asymmetric, duplicates case is printed to the screen.
upper.tail
Logical. If TRUE, probabilities are P(X >= v), else P(X

Value

  • dhint, phint, and qhint return a data frame with two columns: v, the intersection size, and p, the associated p-values. rhint returns an integer vector of random samples based on the hypergeometric intersection distribution.

Details

The hypergeometric intersection distributions describe the distribution of intersection sizes when sampling without replacement from two separate urns in which reside balls belonging to the same n object categories. In the simplest case when there is exactly one ball in each category in each urn (symmetrical, singleton case), then the distribution is hypergeometric: $$P(X=v)=\frac{{a \choose v}{n-a \choose b-v}}{{n \choose b}}$$ If, however, we allow duplicates in $q \leq n$ of the categories in the second urn, then the distribution of intersection sizes is described by the following variant of the hypergeometric: $$P(X=v) = \sum_{m=0}^{\alpha} \sum_{l=0}^{\beta} \sum_{j=0}^{l} {n-q \choose v-l} {q \choose l} {q-l \choose m} {n-v-q+l \choose a-v-m} {l \choose j} {n+q-a-m-j \choose b-v} / {n \choose a}{n+q \choose b}$$

References

Kalinka, A.T. (2013). The probability of drawing intersections: extending the hypergeometric distribution. http://arxiv.org/abs/1305.0717{arXiv.1305.0717}

See Also

hint.test, hint.dist.test, plotDistr, Hyperdistinct.

Examples

Run this code
## Generate the distribution of intersections sizes without duplicates:
dd <- dhint(20, 10, 12)
## Restrict the range of intersections.
dd <- dhint(20, 10, 12, range = 0:5)
## Allow duplicates in q of the categories in the second urn:
dd <- dhint(35, 15, 11, 22, verbose = FALSE)
## Generate cumulative probabilities.
pp <- phint(29, 15, 8, vals = 5)
pp <- phint(29, 15, 8, vals = 2, upper.tail = FALSE)
pp <- phint(29, 15, 8, 23, vals = 2)
## Extract quantiles:
qq <- qhint(0.15, 23, 12, 10)
qq <- qhint(0.15, 23, 12, 10, 18)
## Generate random samples from Hypergeometric intersection distributions.
rr <- rhint(num = 10, 18, 9, 14)
rr <- rhint(num = 10, 22, 11, 17, 12)

Run the code above in your browser using DataLab