Learn R Programming

hint (version 0.1-1)

Hyperdistinct: Drawing Distinct Categories from a Single Urn

Description

Density, distribution function, quantile function and random generation for the distribution of distinct categories drawn from a single urn in which there are duplicates in q of the categories.

Usage

dhydist(n, a, q, range = NULL, log = FALSE) phydist(n, a, q, vals, upper.tail = TRUE, log.p = FALSE) qhydist(p, n, a, q, upper.tail = TRUE, log.p = FALSE) rhydist(num = 5, n, a, q)

Arguments

n
An integer specifying the number of categories in the urn.
a
An integer specifying the number of balls drawn from the urn.
q
An integer specifying the number of categories in the urn which have duplicate members.
p
A probability between 0 and 1.
num
An integer specifying the number of random numbers to generate. Defaults to 5.
range, vals
A vector of integers specifying the intersection sizes for which probabilities (dhydist) or cumulative probabilites (phydist) should be computed (can be a single number). If range is NULL (default) then probabilities will be returned over the entire range of possible values.
log, log.p
Logical. If TRUE, probabilities p are given as log(p). Defaults to FALSE.
upper.tail
Logical. If TRUE, probabilities are P(X >= c), else P(X

Value

dhydist, phydist, and qhydist return a data frame with two columns: c, the number of distinct categories drawn, and p, the associated p-values. rhydist returns an integer vector of random samples based on the distribution of distinct categories when sampling from a single urn containing $q$ duplicates in $n$ categories.

Details

The distribution of the number of distinct categories drawn when sampling without replacement from a single urn containing duplicates in $q$ of its $n$ categories is given by $$P(X=c) = {q \choose a-c} \sum^{q}_{j=0} {q-a+c \choose j}{n-a+c-j \choose 2c-a-j}/{n+q \choose a}$$ When all of the $n$ categories contain duplicates, this can be expressed in a closed form: $$P(X=c) = {n \choose c}{c \choose a-c}2^{2c-a} /{2n \choose a}$$

References

Kalinka, A.T. (2013). The probability of drawing intersections: extending the hypergeometric distribution. arXiv.1305.0717

See Also

Hyperintersection, plotDistr.

Examples

Run this code
## Generate the distribution of distinct categories drawn from a single urn.
dd <- dhydist(20, 10, 12)
## Restrict the range of intersections.
dd <- dhydist(20, 10, 12, range = 5:10)
## Generate cumulative probabilities.
pp <- phydist(29, 15, 8, vals = 5)
pp <- phydist(29, 15, 8, vals = 2, upper.tail = FALSE)
## Extract quantiles:
qq <- qhydist(0.15, 23, 12, 10)
## Generate random samples based on this distribution.
rr <- rhydist(num = 10, 18, 9, 12)

Run the code above in your browser using DataLab