This function calculates the Kullback-Leibler divergence (KLD) between two probability distributions, and has many uses, such as in lowest posterior loss probability intervals, posterior predictive checks, prior elicitation, reference priors, and Variational Bayes.

`KLD(px, py, base)`

px

This is a required vector of probability densities,
considered as \(p(\textbf{x})\). Log-densities are also
accepted, in which case both `px`

and `py`

must be
log-densities.

py

This is a required vector of probability densities,
considered as \(p(\textbf{y})\). Log-densities are also
accepted, in which case both `px`

and `py`

must be
log-densities.

base

This optional argument specifies the logarithmic base,
which defaults to `base=exp(1)`

(or \(e\)) and represents
information in natural units (nats), where `base=2`

represents
information in binary units (bits).

`KLD`

returns a list with the following components:

This is \(\mathrm{KLD}_i[p(\textbf{x}_i) || p(\textbf{y}_i)]\).

This is \(\mathrm{KLD}_i[p(\textbf{y}_i) || p(\textbf{x}_i)]\).

This is the mean of the two components above. This is
the expected posterior loss in `LPL.interval`

.

This is \(\mathrm{KLD}[p(\textbf{x}) || p(\textbf{y})]\). This is a directed divergence.

This is \(\mathrm{KLD}[p(\textbf{y}) || p(\textbf{x})]\). This is a directed divergence.

This is the mean of the two components above.

This is minimum of the two directed divergences.

The Kullback-Leibler divergence (KLD) is known by many names, some of which are Kullback-Leibler distance, K-L, and logarithmic divergence. KLD is an asymmetric measure of the difference, distance, or direct divergence between two probability distributions \(p(\textbf{y})\) and \(p(\textbf{x})\) (Kullback and Leibler, 1951). Mathematically, however, KLD is not a distance, because of its asymmetry.

Here, \(p(\textbf{y})\) represents the ``true'' distribution of data, observations, or theoretical distribution, and \(p(\textbf{x})\) represents a theory, model, or approximation of \(p(\textbf{y})\).

For probability distributions \(p(\textbf{y})\) and \(p(\textbf{x})\) that are discrete (whether the underlying distribution is continuous or discrete, the observations themselves are always discrete, such as from \(i=1,\dots,N\)),

$$\mathrm{KLD}[p(\textbf{y}) || p(\textbf{x})] = \sum^N_i p(\textbf{y}_i) \log\frac{p(\textbf{y}_i)}{p(\textbf{x}_i)}$$

In Bayesian inference, KLD can be used as a measure of the information
gain in moving from a prior distribution, \(p(\theta)\),
to a posterior distribution, \(p(\theta | \textbf{y})\). As such, KLD is the basis of reference priors and lowest
posterior loss intervals (`LPL.interval`

), such as in
Berger, Bernardo, and Sun (2009) and Bernardo (2005). The intrinsic
discrepancy was introduced by Bernardo and Rueda (2002). For more
information on the intrinsic discrepancy, see
`LPL.interval`

.

Berger, J.O., Bernardo, J.M., and Sun, D. (2009). "The Formal
Definition of Reference Priors". *The Annals of Statistics*,
37(2), p. 905--938.

Bernardo, J.M. and Rueda, R. (2002). "Bayesian Hypothesis Testing: A
Reference Approach". *International Statistical Review*, 70,
p. 351--372.

Bernardo, J.M. (2005). "Intrinsic Credible Regions: An Objective
Bayesian Approach to Interval Estimation". *Sociedad de
Estadistica e Investigacion Operativa*, 14(2), p. 317--384.

Kullback, S. and Leibler, R.A. (1951). "On Information and
Sufficiency". *The Annals of Mathematical Statistics*, 22(1),
p. 79--86.

```
# NOT RUN {
library(LaplacesDemon)
px <- dnorm(runif(100),0,1)
py <- dnorm(runif(100),0.1,0.9)
KLD(px,py)
# }
```

Run the code above in your browser using DataLab