Learn R Programming

hint (version 0.1-1)

Hyperintersection: The Hypergeometric Intersection Family of Distributions

Description

Density, distribution function, quantile function and random generation for the hypergeometric intersection distribution.

Usage

dhint(n, A, q = 0, range = NULL, approx = FALSE, log = FALSE, verbose = TRUE) phint(n, A, q = 0, vals, upper.tail = TRUE, log.p = FALSE) qhint(p, n, A, q = 0, upper.tail = TRUE, log.p = FALSE) rhint(num = 5, n, A, q = 0)

Arguments

n
An integer specifying the number of categories in the urns.
A
A vector of integers specifying the numbers of balls drawn from each urn. The length of the vector equals the number of urns.
q
An integer specifying the number of categories in the second urn which have duplicate members. If q is 0 (default) then the symmetrical, singleton case is computed, otherwise the asymmetrical, duplicates case is computed (see Details).
p
A probability between 0 and 1.
num
An integer specifying the number of random numbers to generate. Defaults to 5.
range, vals
A vector of integers specifying the intersection sizes for which probabilities (dhint) or cumulative probabilites (phint) should be computed (can be a single number). If range is NULL (default) then probabilities will be returned over the entire range of possible values.
approx
Logical. If TRUE, a binomial approximation will be used to generate the distribution.
log, log.p
Logical. If TRUE, probabilities p are given as log(p). Defaults to FALSE.
verbose
Logical. If TRUE, progress of calculation in the asymmetric, duplicates case is printed to the screen.
upper.tail
Logical. If TRUE, probabilities are P(X >= v), else P(X

Value

dhint, phint, and qhint return a data frame with two columns: v, the intersection size, and p, the associated p-values. rhint returns an integer vector of random samples based on the hypergeometric intersection distribution.

Details

The hypergeometric intersection distributions describe the distribution of intersection sizes when sampling without replacement from two separate urns in which reside balls belonging to the same n object categories. In the simplest case when there is exactly one ball in each category in each urn (symmetrical, singleton case), then the distribution is hypergeometric: $$P(X=v)=\frac{{a \choose v}{n-a \choose b-v}}{{n \choose b}}$$ When there are three urns, the distribution is given by $$P(X=v) = \frac{ {a \choose v} \sum_{i} {a-v \choose i} {n-a \choose b-v-i} {n-v-i \choose c-v} }{ {n \choose b} {n \choose c} } $$ If, however, we allow duplicates in $q <= n$="" of="" the="" categories="" in="" second="" urn,="" then="" distribution="" intersection="" sizes="" is="" described="" by="" following="" variant="" hypergeometric:="" $$p(x="v)" =="" \sum_{m="0}^{\alpha}" \sum_{l="0}^{\beta}" \sum_{j="0}^{l}" {n-q="" \choose="" v-l}="" {q="" l}="" {q-l="" m}="" {n-v-q+l="" a-v-m}="" {l="" j}="" {n+q-a-m-j="" b-v}="" {n="" a}{n+q="" b}$$<="" p="">

References

Kalinka, A.T. (2013). The probability of drawing intersections: extending the hypergeometric distribution. arXiv.1305.0717

See Also

Binomialintersection, hint.test, hint.dist.test, plotDistr, Hyperdistinct.

Examples

Run this code
## Generate the distribution of intersections sizes without duplicates:
dd <- dhint(20, c(10, 12))
## Restrict the range of intersections.
dd <- dhint(20, c(10, 12), range = 0:5)
## Allow duplicates in q of the categories in the second urn:
dd <- dhint(35, c(15, 11), 22, verbose = FALSE)
## Generate cumulative probabilities.
pp <- phint(29, c(15, 8), vals = 5)
pp <- phint(29, c(15, 8), vals = 2, upper.tail = FALSE)
pp <- phint(29, c(15, 8), 23, vals = 2)
## Extract quantiles:
qq <- qhint(0.15, 23, c(12, 10))
qq <- qhint(0.15, 23, c(12, 10), 18)
## Generate random samples from Hypergeometric intersection distributions.
rr <- rhint(num = 10, 18, c(9, 14))
rr <- rhint(num = 10, 22, c(11, 17), 12)

Run the code above in your browser using DataLab