Hypergeometric
The Hypergeometric Distribution
Density, distribution function, quantile function and random generation for the hypergeometric distribution.
- Keywords
- distribution
Usage
dhyper(x, m, n, k, log = FALSE)
phyper(q, m, n, k, lower.tail = TRUE, log.p = FALSE)
qhyper(p, m, n, k, lower.tail = TRUE, log.p = FALSE)
rhyper(nn, m, n, k)
Arguments
- x, q
vector of quantiles representing the number of white balls drawn without replacement from an urn which contains both black and white balls.
- m
the number of white balls in the urn.
- n
the number of black balls in the urn.
- k
the number of balls drawn from the urn.
- p
probability, it must be between 0 and 1.
- nn
number of observations. If
length(nn) > 1, the length is taken to be the number required.- log, log.p
logical; if TRUE, probabilities p are given as log(p).
- lower.tail
logical; if TRUE (default), probabilities are \(P[X \le x]\), otherwise, \(P[X > x]\).
Details
The hypergeometric distribution is used for sampling without
replacement. The density of this distribution with parameters
m, n and k (named \(Np\), \(N-Np\), and
\(n\), respectively in the reference below) is given by
$$
p(x) = \left. {m \choose x}{n \choose k-x} \right/ {m+n \choose k}%
$$
for \(x = 0, \ldots, k\).
Note that \(p(x)\) is non-zero only for \(\max(0, k-n) \le x \le \min(k, m)\).
With \(p := m/(m+n)\) (hence \(Np = N \times p\) in the reference's notation), the first two moments are mean $$E[X] = \mu = k p$$ and variance $$\mbox{Var}(X) = k p (1 - p) \frac{m+n-k}{m+n-1},$$ which shows the closeness to the Binomial\((k,p)\) (where the hypergeometric has smaller variance unless \(k = 1\)).
The quantile is defined as the smallest value \(x\) such that \(F(x) \ge p\), where \(F\) is the distribution function.
If one of \(m, n, k\), exceeds .Machine$integer.max,
currently the equivalent of qhyper(runif(nn), m,n,k) is used,
when a binomial approximation may be considerably more efficient.
Value
dhyper gives the density,
phyper gives the distribution function,
qhyper gives the quantile function, and
rhyper generates random deviates.
Invalid arguments will result in return value NaN, with a warning.
The length of the result is determined by n for
rhyper, and is the maximum of the lengths of the
numerical arguments for the other functions.
The numerical arguments other than n are recycled to the
length of the result. Only the first elements of the logical
arguments are used.
References
Johnson, N. L., Kotz, S., and Kemp, A. W. (1992) Univariate Discrete Distributions, Second Edition. New York: Wiley.
See Also
Distributions for other standard distributions.
Examples
library(stats)
# NOT RUN {
m <- 10; n <- 7; k <- 8
x <- 0:(k+1)
rbind(phyper(x, m, n, k), dhyper(x, m, n, k))
all(phyper(x, m, n, k) == cumsum(dhyper(x, m, n, k))) # FALSE
# }
# NOT RUN {
## but error is very small:
signif(phyper(x, m, n, k) - cumsum(dhyper(x, m, n, k)), digits = 3)
# }