dist_categorical: The Categorical distribution

Description

Categorical distributions are used to represent events with multiple outcomes, such as what number appears on the roll of a dice. This is also referred to as the 'generalised Bernoulli' or 'multinoulli' distribution. The Categorical distribution is a special case of the Multinomial() distribution with n = 1.

Usage

dist_categorical(prob, outcomes = NULL)

Arguments

prob: A list of probabilities of observing each outcome category.
outcomes: The list of vectors where each value represents each outcome.

Details

We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_categorical.html

In the following, let $X$ be a Categorical random variable with probability parameters prob = $\{p_1, p_2, \ldots, p_k\}$.

The Categorical probability distribution is widely used to model the occurance of multiple events. A simple example is the roll of a dice, where $p = \{1/6, 1/6, 1/6, 1/6, 1/6, 1/6\}$ giving equal chance of observing each number on a 6 sided dice.

Support: $\{1, \ldots, k\}$

Mean: Not defined for unordered categories. For ordered categories with integer outcomes $\{1, 2, \ldots, k\}$, the mean is:

$$ E(X) = \sum_{i=1}^{k} i \cdot p_i $$

Variance: Not defined for unordered categories. For ordered categories with integer outcomes $\{1, 2, \ldots, k\}$, the variance is:

$$ \text{Var}(X) = \sum_{i=1}^{k} i^2 \cdot p_i - \left(\sum_{i=1}^{k} i \cdot p_i\right)^2 $$

Probability mass function (p.m.f):

$$ P(X = i) = p_i $$

Cumulative distribution function (c.d.f):

The c.d.f is undefined for unordered categories. For ordered categories with outcomes $x_1 < x_2 < \ldots < x_k$, the c.d.f is:

$$ P(X \le x_j) = \sum_{i=1}^{j} p_i $$

Moment generating function (m.g.f):

$$ E(e^{tX}) = \sum_{i=1}^{k} e^{tx_i} \cdot p_i $$

Skewness: Approximated numerically for ordered categories.

Kurtosis: Approximated numerically for ordered categories.

Examples

Run this code

dist <- dist_categorical(prob = list(c(0.05, 0.5, 0.15, 0.2, 0.1), c(0.3, 0.1, 0.6)))

dist

generate(dist, 10)

density(dist, 2)
density(dist, 2, log = TRUE)

# The outcomes aren't ordered, so many statistics are not applicable.
cdf(dist, 0.6)
quantile(dist, 0.7)
mean(dist)
variance(dist)
skewness(dist)
kurtosis(dist)

# Some of these statistics are meaningful for ordered outcomes
dist <- dist_categorical(list(rpois(26, 3)), list(ordered(letters)))
dist
cdf(dist, "m")
quantile(dist, 0.5)

dist <- dist_categorical(
  prob = list(c(0.05, 0.5, 0.15, 0.2, 0.1), c(0.3, 0.1, 0.6)),
  outcomes = list(letters[1:5], letters[24:26])
)

generate(dist, 10)

density(dist, "a")
density(dist, "z", log = TRUE)

Run the code above in your browser using DataLab

Description

Usage

Arguments

Details

See Also

Examples