Learn R Programming

distributional (version 0.6.0)

dist_categorical: The Categorical distribution

Description

[Stable]

Categorical distributions are used to represent events with multiple outcomes, such as what number appears on the roll of a dice. This is also referred to as the 'generalised Bernoulli' or 'multinoulli' distribution. The Categorical distribution is a special case of the Multinomial() distribution with n = 1.

Usage

dist_categorical(prob, outcomes = NULL)

Arguments

prob

A list of probabilities of observing each outcome category.

outcomes

The list of vectors where each value represents each outcome.

Details

We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_categorical.html

In the following, let \(X\) be a Categorical random variable with probability parameters prob = \(\{p_1, p_2, \ldots, p_k\}\).

The Categorical probability distribution is widely used to model the occurance of multiple events. A simple example is the roll of a dice, where \(p = \{1/6, 1/6, 1/6, 1/6, 1/6, 1/6\}\) giving equal chance of observing each number on a 6 sided dice.

Support: \(\{1, \ldots, k\}\)

Mean: Not defined for unordered categories. For ordered categories with integer outcomes \(\{1, 2, \ldots, k\}\), the mean is:

$$ E(X) = \sum_{i=1}^{k} i \cdot p_i $$

Variance: Not defined for unordered categories. For ordered categories with integer outcomes \(\{1, 2, \ldots, k\}\), the variance is:

$$ \text{Var}(X) = \sum_{i=1}^{k} i^2 \cdot p_i - \left(\sum_{i=1}^{k} i \cdot p_i\right)^2 $$

Probability mass function (p.m.f):

$$ P(X = i) = p_i $$

Cumulative distribution function (c.d.f):

The c.d.f is undefined for unordered categories. For ordered categories with outcomes \(x_1 < x_2 < \ldots < x_k\), the c.d.f is:

$$ P(X \le x_j) = \sum_{i=1}^{j} p_i $$

Moment generating function (m.g.f):

$$ E(e^{tX}) = \sum_{i=1}^{k} e^{tx_i} \cdot p_i $$

Skewness: Approximated numerically for ordered categories.

Kurtosis: Approximated numerically for ordered categories.

See Also

stats::Multinomial

Examples

Run this code
dist <- dist_categorical(prob = list(c(0.05, 0.5, 0.15, 0.2, 0.1), c(0.3, 0.1, 0.6)))

dist

generate(dist, 10)

density(dist, 2)
density(dist, 2, log = TRUE)

# The outcomes aren't ordered, so many statistics are not applicable.
cdf(dist, 0.6)
quantile(dist, 0.7)
mean(dist)
variance(dist)
skewness(dist)
kurtosis(dist)

# Some of these statistics are meaningful for ordered outcomes
dist <- dist_categorical(list(rpois(26, 3)), list(ordered(letters)))
dist
cdf(dist, "m")
quantile(dist, 0.5)

dist <- dist_categorical(
  prob = list(c(0.05, 0.5, 0.15, 0.2, 0.1), c(0.3, 0.1, 0.6)),
  outcomes = list(letters[1:5], letters[24:26])
)

generate(dist, 10)

density(dist, "a")
density(dist, "z", log = TRUE)

Run the code above in your browser using DataLab