Learn R Programming

⚠️There's a newer version (0.8.0) of this package.Take me there.

philentropy

Similarity and Distance Quantification between Probability Functions

Describe and understand the world through data.

Data collection and data comparison are the foundations of scientific research. Mathematics provides the abstract framework to describe patterns we observe in nature and Statistics provides the framework to quantify the uncertainty of these patterns. In statistics, natural patterns are described in form of probability distributions which either follow a fixed pattern (parametric distributions) or more dynamic patterns (non-parametric distributions).

The philentropy package implements fundamental distance and similarity measures to quantify distances between probability density functions as well as traditional information theory measures. In this regard, it aims to provide a framework for comparing natural patterns in a statistical notation.

This project is born out of my passion for statistics and I hope that it will be useful to the people who share it with me.

Installation

# install philentropy version 0.2.0 from CRAN
install.packages("philentropy")

Citation

HG Drost, (2018). Philentropy: Information Theory and Distance Quantification with R. Journal of Open Source Software, 3(26), 765. https://doi.org/10.21105/joss.00765

Tutorials

Examples

library(philentropy)
# retrieve available distance metrics
getDistMethods()
 [1] "euclidean"         "manhattan"         "minkowski"        
 [4] "chebyshev"         "sorensen"          "gower"            
 [7] "soergel"           "kulczynski_d"      "canberra"         
[10] "lorentzian"        "intersection"      "non-intersection" 
[13] "wavehedges"        "czekanowski"       "motyka"           
[16] "kulczynski_s"      "tanimoto"          "ruzicka"          
[19] "inner_product"     "harmonic_mean"     "cosine"           
[22] "hassebrook"        "jaccard"           "dice"             
[25] "fidelity"          "bhattacharyya"     "hellinger"        
[28] "matusita"          "squared_chord"     "squared_euclidean"
[31] "pearson"           "neyman"            "squared_chi"      
[34] "prob_symm"         "divergence"        "clark"            
[37] "additive_symm"     "kullback-leibler"  "jeffreys"         
[40] "k_divergence"      "topsoe"            "jensen-shannon"   
[43] "jensen_difference" "taneja"            "kumar-johnson"    
[46] "avg"
# define a probability density function P
P <- 1:10/sum(1:10)
# define a probability density function Q
Q <- 20:29/sum(20:29)

# combine P and Q as matrix object
x <- rbind(P,Q)

# compute the jensen-shannon distance between
# probability density functions P and Q
distance(x, method = "jensen-shannon")
jensen-shannon using unit 'log'.
jensen-shannon 
    0.02628933

Alternatively, users can also retrieve values from all available distance/similarity metrics using dist.diversity():

dist.diversity(x, p = 2, unit = "log2")
        euclidean         manhattan 
       0.12807130        0.35250464 
        minkowski         chebyshev 
       0.12807130        0.06345083 
         sorensen             gower 
       0.17625232        0.03525046 
          soergel      kulczynski_d 
       0.29968454        0.42792793 
         canberra        lorentzian 
       2.09927095        0.49712136 
     intersection  non-intersection 
       0.82374768        0.17625232 
       wavehedges       czekanowski 
       3.16657887        0.17625232 
           motyka      kulczynski_s 
       0.58812616        2.33684211 
         tanimoto           ruzicka 
       0.29968454        0.70031546 
    inner_product     harmonic_mean 
       0.10612245        0.94948528 
           cosine        hassebrook 
       0.93427641        0.86613103 
          jaccard              dice 
       0.13386897        0.07173611 
         fidelity     bhattacharyya 
       0.97312397        0.03930448 
        hellinger          matusita 
       0.32787819        0.23184489 
    squared_chord squared_euclidean 
       0.05375205        0.01640226 
          pearson            neyman 
       0.16814418        0.36742465 
      squared_chi         prob_symm 
       0.10102943        0.20205886 
       divergence             clark 
       1.49843905        0.86557468 
    additive_symm  kullback-leibler 
       0.53556883        0.13926288 
         jeffreys      k_divergence 
       0.31761069        0.04216273 
           topsoe    jensen-shannon 
       0.07585498        0.03792749 
jensen_difference            taneja 
       0.03792749        0.04147518 
    kumar-johnson               avg 
       0.62779644        0.20797774

Install Developer Version

# install.packages("devtools")
# install the current version of philentropy on your system
library(devtools)
install_github("HajkD/philentropy", build_vignettes = TRUE, dependencies = TRUE)

NEWS

The current status of the package as well as a detailed history of the functionality of each version of philentropy can be found in the NEWS section.

Important Functions

Distance Measures

  • distance() : Implements 46 fundamental probability distance (or similarity) measures
  • getDistMethods() : Get available method names for 'distance'
  • dist.diversity() : Distance Diversity between Probability Density Functions
  • estimate.probability() : Estimate Probability Vectors From Count Vectors

Information Theory

  • H() : Shannon's Entropy H(X)
  • JE() : Joint-Entropy H(X,Y)
  • CE() : Conditional-Entropy H(X | Y)
  • MI() : Shannon's Mutual Information I(X,Y)
  • KL() : Kullback–Leibler Divergence
  • JSD() : Jensen-Shannon Divergence
  • gJSD() : Generalized Jensen-Shannon Divergence

Discussions and Bug Reports

I would be very happy to learn more about potential improvements of the concepts and functions provided in this package.

Furthermore, in case you find some bugs or need additional (more flexible) functionality of parts of this package, please let me know:

https://github.com/HajkD/philentropy/issues

or find me on twitter: HajkDrost

Copy Link

Version

Install

install.packages('philentropy')

Monthly Downloads

4,235

Version

0.3.0

License

GPL-2

Issues

Pull Requests

Stars

Forks

Maintainer

Hajk-Georg Drost

Last Published

February 13th, 2019

Functions in philentropy (0.3.0)

clark_sq

Clark squared distance (lowlevel function)
hellinger

Hellinger distance (lowlevel function)
dice_dist

Dice distance (lowlevel function)
czekanowski

Czekanowski distance (lowlevel function)
intersection_dist

Intersection distance (lowlevel function)
jeffreys

Jeffreys distance (lowlevel function)
jaccard

Jaccard distance (lowlevel function)
jensen_difference

Jensen difference (lowlevel function)
sorensen

Sorensen distance (lowlevel function)
soergel

Soergel distance (lowlevel function)
inner_product

Inner product distance (lowlevel function)
jensen_shannon

Jensen-Shannon distance (lowlevel function)
cosine_dist

Cosine distance (lowlevel function)
k_divergence

K-Divergence (lowlevel function)
squared_chi_sq

Squared chi-squared distance (lowlevel function)
taneja

Taneja difference (lowlevel function)
neyman_chi_sq

Neyman chi-squared distance (lowlevel function)
pearson_chi_sq

Pearson chi-squared distance (lowlevel function)
squared_euclidean

Squared euclidean distance (lowlevel function)
squared_chord

Squared chord distance (lowlevel function)
avg

AVG distance (lowlevel function)
dist.diversity

Distance Diversity between Probability Density Functions
divergence_sq

Divergence squared distance (lowlevel function)
kumar_hassebrook

Kumar hassebrook distance (lowlevel function)
distance

Distances between Probability Density Functions
kumar_johnson

Kumar-Johnson distance (lowlevel function)
prob_symm_chi_sq

Probability symmetric chi-squared distance (lowlevel function)
gJSD

Generalized Jensen-Shannon Divergence
ruzicka

Ruzicka distance (lowlevel function)
getDistMethods

Get method names for distance
canberra

Canberra distance (lowlevel function)
estimate.probability

Estimate Probability Vectors From Count Vectors
gower

Gower distance (lowlevel function)
manhattan

Manhattan distance (lowlevel function)
chebyshev

Chebyshev distance (lowlevel function)
euclidean

Euclidean distance (lowlevel function)
harmonic_mean_dist

Harmonic mean distance (lowlevel function)
lin.cor

Linear Correlation
lorentzian

Lorentzian distance (lowlevel function)
fidelity

Fidelity distance (lowlevel function)
kulczynski_d

Kulczynski_d distance (lowlevel function)
minkowski

Minkowski distance (lowlevel function)
kullback_leibler_distance

kullback-Leibler distance (lowlevel function)
matusita

Matusita distance (lowlevel function)
motyka

Motyka distance (lowlevel function)
tanimoto

Tanimoto distance (lowlevel function)
topsoe

Topsoe distance (lowlevel function)
wave_hedges

Wave hedges distance (lowlevel function)
CE

Shannon's Conditional-Entropy \(H(X | Y)\)
KL

Kullback-Leibler Divergence
H

Shannon's Entropy \(H(X)\)
MI

Shannon's Mutual Information \(I(X,Y)\)
additive_symm_chi_sq

Additive symmetric chi-squared distance (lowlevel function)
bhattacharyya

Bhattacharyya distance (lowlevel function)
JE

Shannon's Joint-Entropy \(H(X,Y)\)
binned.kernel.est

Kernel Density Estimation
JSD

Jensen-Shannon Divergence