discrete_entropy: Shannon entropy for discrete pmf

Description

Computes the Shannon entropy $\mathcal{H}(p) = -\sum_{i=1}^{n} p_i \log p_i$ of a discrete RV $X$ taking values in $\lbrace x_1, \ldots, x_n \rbrace$ with probability mass function (pmf) $P(X = x_i) = p_i$ with $p_i \geq 0$ for all $i$ and $\sum_{i=1}^{n} p_i = 1$.

Usage

discrete_entropy(
  probs,
  base = 2,
  method = c("MLE"),
  threshold = 0,
  prior.probs = NULL,
  prior.weight = 0
)

Arguments

probs

numeric; probabilities (empirical frequencies). Must be non-negative and add up to $1$.

base

logarithm base; entropy is measured in ``nats'' for base = exp(1); in ``bits'' if base = 2 (default).

method

string; method to estimate entropy; see Details below.

threshold

numeric; frequencies below threshold are set to $0$; default threshold = 0, i.e., no thresholding. If prior.weight > 0 then thresholding will be done before smoothing.

prior.probs

optional; only used if prior.weight > 0. Add a prior probability distribution to probs. By default it uses a uniform distribution putting equal probability on each outcome.

prior.weight

numeric; how much weight does the prior distribution get in a mixture model between data and prior distribution? Must be between 0 and 1. Default: 0 (no prior).

Value

numeric; non-negative real value.

Details

discrete_entropy uses a plug-in estimator (method = "MLE"): $$ \widehat{\mathcal{H}}(p) = - \sum_{i=1}^{n} \widehat{p}_i \log \widehat{p}_i. $$

If prior.weight > 0, then it mixes the observed proportions $\widehat{p}_i$ with a prior distribution $$ \widehat{p}_i \leftarrow (1-\lambda) \cdot \widehat{p_i} + \lambda \cdot prior_i, \quad i=1, \ldots, n, $$ where $\lambda \in [0, 1]$ is the prior.weight parameter. By default the prior is a uniform distribution, i.e., $prior_i = \frac{1}{n}$ for all i.

Note that this plugin estimator is biased. See References for an overview of alternative methods.

References

Archer E., Park I. M., Pillow J.W. (2014). “Bayesian Entropy Estimation for Countable Discrete Distributions”. Journal of Machine Learning Research (JMLR) 15, 2833-2868. Available at http://jmlr.org/papers/v15/archer14a.html.

Examples

Run this code

# NOT RUN {
probs.tmp <- rexp(5)
probs.tmp <- sort(probs.tmp / sum(probs.tmp))

unif.distr <- rep(1/length(probs.tmp), length(probs.tmp))

matplot(cbind(probs.tmp, unif.distr), pch = 19,
        ylab = "P(X = k)", xlab = "k")
matlines(cbind(probs.tmp, unif.distr))
legend("topleft", c("non-uniform", "uniform"), pch = 19,
       lty = 1:2, col = 1:2, box.lty = 0)

discrete_entropy(probs.tmp)
# uniform has largest entropy among all bounded discrete pmfs
# (here = log(5))
discrete_entropy(unif.distr)
# no uncertainty if one element occurs with probability 1
discrete_entropy(c(1, 0, 0))

# }

Run the code above in your browser using DataLab