Learn R Programming

kldest: Kullback-Leibler divergence estimation

The goal of kldest is to estimate Kullback-Leibler (KL) divergence $D_{KL}(P||Q)$ between two probability distributions $P$ and $Q$ based on:

  • a sample $x_1,...,x_n$ from $P$ and the probability density $q$ of $Q$, or
  • samples $x_1,...,x_n$ from $P$ and $y_1,...,y_m$ from $Q$.

The distributions $P$ and $Q$ may be uni- or multivariate, and they may be discrete, continuous or mixed discrete/continuous.

Different estimation algorithms are provided for continuous distributions, either based on nearest neighbour density estimation or kernel density estimation. Confidence intervals for KL divergence can also be computed, either via subsampling (preferred) or bootstrapping.

Installation

You can install kldest from CRAN:

install.packages("kldest")

Alternatively, can install the development version of kldest from GitHub with:

# install.packages("devtools")
devtools::install_github("niklhart/kldest")

A minimal example for KL divergence estimation

KL divergence estimation based on nearest neighbour density estimates is the most flexible approach.

library(kldest)

Set a seed for reproducibility

set.seed(0)

KL divergence between 1-D Gaussians

Analytical KL divergence:

kld_gaussian(mu1 = 0, sigma1 = 1, mu2 = 1, sigma2 = 2^2)
#> [1] 0.4431472

Estimate based on two samples from these Gaussians:

X <- rnorm(100)
Y <- rnorm(100, mean = 1, sd = 2)
kld_est_nn(X, Y)
#> [1] 0.2169136

Estimate based on a sample from the first Gaussian and the density of the second:

q <- function(x) dnorm(x, mean = 1, sd =2)
kld_est_nn(X, q = q)
#> [1] 0.6374628

Uncertainty quantification via subsampling:

kld_ci_subsampling(X, q = q)
#> $est
#> [1] 0.6374628
#> 
#> $ci
#>      2.5%     97.5% 
#> 0.2601375 0.9008446

KL divergence between 2-D Gaussians

Analytical KL divergence between an uncorrelated and a correlated Gaussian:

kld_gaussian(mu1 = rep(0,2), sigma1 = diag(2),
             mu2 = rep(0,2), sigma2 = matrix(c(1,1,1,2),nrow=2))
#> [1] 0.5

Estimate based on two samples from these Gaussians:

X1 <- rnorm(100)
X2 <- rnorm(100)
Y1 <- rnorm(100)
Y2 <- Y1 + rnorm(100)
X <- cbind(X1,X2)
Y <- cbind(Y1,Y2)

kld_est_nn(X, Y)
#> [1] 0.3358918

Copy Link

Version

Install

install.packages('kldest')

Monthly Downloads

241

Version

1.0.0

License

MIT + file LICENSE

Maintainer

Niklas Hartung

Last Published

April 9th, 2024

Functions in kldest (1.0.0)

kld_est_nn

k-nearest neighbour KL divergence estimator
trapz

Trapezoidal integration in 1 or 2 dimensions
to_uniform_scale

Transform samples to uniform scale
kld_uniform

Analytical KL divergence for two uniform distributions
tr

Matrix trace operator
kld_uniform_gaussian

Analytical KL divergence between a uniform and a Gaussian distribution
is_two_sample

Detect if a one- or two-sample problem is specified
kld_est_brnn

Bias-reduced generalized k-nearest-neighbour KL divergence estimation
constDiagMatrix

Constant plus diagonal matrix
kld_ci_bootstrap

Uncertainty of KL divergence estimate using Efron's bootstrap.
kld_est_discrete

Plug-in KL divergence estimator for samples from discrete distributions
combinations

Combinations of input arguments
kld_est_kde1

1-D kernel density-based estimation of Kullback-Leibler divergence
kld_discrete

Analytical KL divergence for two discrete distributions
kld_ci_subsampling

Uncertainty of KL divergence estimate using Politis/Romano's subsampling bootstrap.
convergence_rate

Empirical convergence rate of a KL divergence estimator
kld_est_kde2

2-D kernel density-based estimation of Kullback-Leibler divergence
kld_est

Kullback-Leibler divergence estimator for discrete, continuous or mixed data.
kld_est_kde

Kernel density-based Kullback-Leibler divergence estimation in any dimension
kldest-package

kldest: Sample-Based Estimation of Kullback-Leibler Divergence
mvdnorm

Probability density function of multivariate Gaussian distribution
kld_exponential

Analytical KL divergence for two univariate exponential distributions
kld_gaussian

Analytical KL divergence for two uni- or multivariate Gaussian distributions