stats (version 3.5.2)

# Hypergeometric: The Hypergeometric Distribution

## Description

Density, distribution function, quantile function and random generation for the hypergeometric distribution.

## Usage

dhyper(x, m, n, k, log = FALSE)
phyper(q, m, n, k, lower.tail = TRUE, log.p = FALSE)
qhyper(p, m, n, k, lower.tail = TRUE, log.p = FALSE)
rhyper(nn, m, n, k)

## Arguments

x, q

vector of quantiles representing the number of white balls drawn without replacement from an urn which contains both black and white balls.

m

the number of white balls in the urn.

n

the number of black balls in the urn.

k

the number of balls drawn from the urn.

p

probability, it must be between 0 and 1.

nn

number of observations. If length(nn) > 1, the length is taken to be the number required.

log, log.p

logical; if TRUE, probabilities p are given as log(p).

lower.tail

logical; if TRUE (default), probabilities are $$P[X \le x]$$, otherwise, $$P[X > x]$$.

## Value

dhyper gives the density, phyper gives the distribution function, qhyper gives the quantile function, and rhyper generates random deviates.

Invalid arguments will result in return value NaN, with a warning.

The length of the result is determined by n for rhyper, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.

## Details

The hypergeometric distribution is used for sampling without replacement. The density of this distribution with parameters m, n and k (named $$Np$$, $$N-Np$$, and $$n$$, respectively in the reference below) is given by $$p(x) = \left. {m \choose x}{n \choose k-x} \right/ {m+n \choose k}%$$ for $$x = 0, \ldots, k$$.

Note that $$p(x)$$ is non-zero only for $$\max(0, k-n) \le x \le \min(k, m)$$.

With $$p := m/(m+n)$$ (hence $$Np = N \times p$$ in the reference's notation), the first two moments are mean $$E[X] = \mu = k p$$ and variance $$\mbox{Var}(X) = k p (1 - p) \frac{m+n-k}{m+n-1},$$ which shows the closeness to the Binomial$$(k,p)$$ (where the hypergeometric has smaller variance unless $$k = 1$$).

The quantile is defined as the smallest value $$x$$ such that $$F(x) \ge p$$, where $$F$$ is the distribution function.

If one of $$m, n, k$$, exceeds .Machine\$integer.max, currently the equivalent of qhyper(runif(nn), m,n,k) is used, when a binomial approximation may be considerably more efficient.

## References

Johnson, N. L., Kotz, S., and Kemp, A. W. (1992) Univariate Discrete Distributions, Second Edition. New York: Wiley.

Distributions for other standard distributions.

## Examples

Run this code
# NOT RUN {
m <- 10; n <- 7; k <- 8
x <- 0:(k+1)
rbind(phyper(x, m, n, k), dhyper(x, m, n, k))
all(phyper(x, m, n, k) == cumsum(dhyper(x, m, n, k)))  # FALSE
# }
# NOT RUN {
## but error is very small:
signif(phyper(x, m, n, k) - cumsum(dhyper(x, m, n, k)), digits = 3)
# }


Run the code above in your browser using DataCamp Workspace