Learn R Programming

entropy (version 1.0.0)

entropy.Dirichlet: Family of Dirichlet entropy estimators

Description

entropy.Dirichlet estimates the Shannon entropy H of the random variable Y from the corresponding observed counts y by plug-in of Bayesian estimates of the bin frequencies using the Dirichlet-multinomial pseudocount model.

freqs.Dirichlet computes the Bayesian estimates of the bin frequencies using the Dirichlet-multinomial pseudocount model.

Usage

entropy.Dirichlet(y, a, unit=c("log", "log2", "log10"))
freqs.Dirichlet(y, a)

Arguments

y
vector of counts.
a
pseudocount per bin.
unit
the unit in which entropy is measured.

Value

  • entropy.Dirichlet returns an estimate of the Shannon entropy.

    freqs.Dirichlet returns the underlying frequencies.

Details

The Dirichlet-multinomial pseudocount entropy estimator is a Bayesian plug-in estimator: in the definition of the Shannon entropy the bin probabilities are replaced by the respective Bayesian estimates of the frequencies, using a model with a Dirichlet prior and a multinomial likelihood.

The parameter a is a parameter of of the Dirichlet prior, and in effect specifies the pseudocount per bin. Popular choices of a are:

a=0{maximum likelihood estimator (see entropy.empirical) } a=1/2{Jeffreys' prior; Krichevsky-Trovimov (1991) entropy estimator} a=1{Laplace's prior} a=1/length(y){Schurmann-Grassberger (1996) entropy estimator} a=sqrt(sum(y))/length(y){minimax prior}

The pseudocount a can also be a vector so that for each bin an individual pseudocount is added.

References

Agresti, A., and D. B. Hitchcock. 2005. Bayesian inference for categorical data analysis. Stat. Methods. Appl. 14:297--330. Krichevsky, R. E., and V. K. Trofimov. 1981. The performance of universal encoding. IEEE Trans. Inf. Theory 27: 199-207.

Schurmann, T., and P. Grassberger. 1996. Entropy estimation of symbol sequences. Chaos 6:41-427.

See Also

entropy.shrink, entropy.NSB, entropy.ChaoShen, entropy.empirical, entropy.plugin.

Examples

Run this code
# load entropy library 
library("entropy")

# observed counts for each bin
y = c(4, 2, 3, 0, 2, 4, 0, 0, 2, 1, 1)  

# Dirichlet estimate with a=0
entropy.Dirichlet(y, a=0)

# compare to empirical estimate
entropy.empirical(y)

# Dirichlet estimate with a=1/2
entropy.Dirichlet(y, a=1/2)

# Dirichlet estimate with a=1
entropy.Dirichlet(y, a=1)

# Dirichlet estimate with a=1/length(y)
entropy.Dirichlet(y, a=1/length(y))

# Dirichlet estimate with a=sqrt(sum(y))/length(y)
entropy.Dirichlet(y, a=sqrt(sum(y))/length(y))

Run the code above in your browser using DataLab