Learn R Programming

⚠️There's a newer version (1.6.9) of this package.Take me there.

distr6

https://www.r-pkg.org/badges/version-ago/distr6 https://cranlogs.r-pkg.org/badges/grand-total/distr6?color=brightgreen

What is distr6?

distr6 is a unified and clean interface to organise the probability distributions implemented in R into one R6 object oriented package, As well as adding distributions that are not yet implemented in R. Currently we have 37 probability distributions as well as 11 kernels. Building the package from the ground up and making use of tried and tested design patterns (as per Gamma et al. 1994), distr6 aims to make probability distributions easy to use, understand and analyse.

distr6 extends the work of Peter Ruckdeschel, Matthias Kohl et al. who created the first object-oriented (OO) interface for distributions using S4. Their distr package is currently the gold-standard in R for OO distribution handling. Using R6 we aim to take this even further and to create a scalable interface that can continue to grow with the community. Full details of the API and class structure can be seen in the distr6 website.

Main Features

distr6 is not intended to replace the base R distributions function but instead to give an alternative that focuses on distributions as objects that can be manipulated and accessed as required. The main features therefore centre on OOP practices, design patterns and API design. Of particular note:

All distributions in base R introduced as objects with methods for common statistical functions including pdf, cdf, inverse cdf, simulation, mean, variance, skewness and kurtosis

B <- Binomial$new(prob = 0.5, size = 10)
B$pdf(1:10)
#>  [1] 0.0097656250 0.0439453125 0.1171875000 0.2050781250 0.2460937500
#>  [6] 0.2050781250 0.1171875000 0.0439453125 0.0097656250 0.0009765625
B$kurtosis()
#> [1] -0.2
B$rand(5)
#> [1] 7 7 4 7 6
summary(B)
#> Binomial Probability Distribution. Parameterised with:
#>   prob = 0.5, size = 10
#> 
#>   Quick Statistics 
#>  Mean:       5
#>  Variance:   2.5
#>  Skewness:   0
#>  Ex. Kurtosis:   -0.2
#> 
#>  Support: {0,...,10}     Scientific Type: ℕ0 
#> 
#>  Traits: discrete; univariate
#>  Properties: symmetric; platykurtic; no skew

Flexible construction of distributions for common parameterisations

Exponential$new(rate = 2)
#> Exp(rate = 2)
Exponential$new(scale = 2)
#> Exp(scale = 2)
Normal$new(mean = 0, prec = 2)
#> Norm(mean = 0, prec = 2)
Normal$new(mean = 0, sd = 3)$parameters()
#>      id     value support                                 description
#> 1: mean         0       ℝ                   Mean - Location Parameter
#> 2:  var         9      ℝ+          Variance - Squared Scale Parameter
#> 3:   sd         3      ℝ+        Standard Deviation - Scale Parameter
#> 4: prec 0.1111111      ℝ+ Precision - Inverse Squared Scale Parameter

Decorators for extending functionality of distributions to more complex modelling methods

B <- Binomial$new()
decorate(B, ExoticStatistics)
#> B is now decorated with ExoticStatistics
#> Binom(prob = 0.5, size = 10)
B$survival(2)
#> [1] 0.9453125
decorate(B, CoreStatistics)
#> B is now decorated with CoreStatistics
#> Binom(prob = 0.5, size = 10)
B$kthmoment(6)
#> Results from numeric calculations are approximate only. Better results may be available.
#> [1] 190

S3 compatibility to make the interface more flexible for users who are less familiar with OOP

B <- Binomial$new()
mean(B) # B$mean()
#> [1] 5
variance(B) # B$variance()
#> [1] 2.5
cdf(B, 2:5) # B$cdf(2:5)
#> [1] 0.0546875 0.1718750 0.3769531 0.6230469

Wrappers including truncation, huberization and product distributions for manipulation and composition of distributions.

B <- Binomial$new()
TruncatedDistribution$new(B, lower = 2, upper = 5) #Or: truncate(B,2,5)
#> TruncBinom(Binom_prob = 0.5, Binom_size = 10)
N <- Normal$new()
MixtureDistribution$new(list(B,N), weights = c(0.1, 0.9))
#> BinomMixNorm(Binom_prob = 0.5, Binom_size = 10, Norm_mean = 0, Norm_var = 1)
ProductDistribution$new(list(B,N))
#> BinomXNorm(Binom_prob = 0.5, Binom_size = 10, Norm_mean = 0, Norm_var = 1)

Additionally we introduce a SetSymbol class for a purely symbolic representation of sets for Distribution typing

Binomial$new()$type()
#> [1] "ℕ0"
Binomial$new()$support()
#> [1] "{0,...,10}"
Set$new(1:5)
#> [1] "{1,...,5}"
Interval$new(1,5)
#> [1] "[1,5]"
PosReals$new()
#> [1] "ℝ+"

Usage

distr6 has three primary use-cases:

  1. Upgrading base Extend the R distributions functions to classes so that each distribution additionally has basic statistical methods including expectation and variance and properties/traits including discrete/continuous, univariate/multivariate, etc.
  2. Statistics Implementing decorators and adaptors to manipulate distributions including distribution composition. Additionally functionality for numeric calculations based on any arbitrary distribution.
  3. Modelling Probabilistic modelling using distr6 objects as the modelling targets. Objects as targets is an understood ML paradigm and introducing distributions as classes is the first step to implementing probabilistic modelling.

Installation

For the latest release on CRAN, install with

install.packages("distr6")

Otherwise for the latest stable build

remotes::install_github("alan-turing-institute/distr6")

Future Plans

The v1.0 release focuses on the core features of the SDistribution class as well as analytic methods in wrappers including but not limit to truncation, huberization, product distributions and mixture distributions. In our next release we plan to include

  • A plot method for Distributions
  • A generalised qqplot for comparing any distributions
  • A finalised FunctionImputation decorator with different imputation strategies
  • Discrete distribution subtraction (negative convolution)
  • A wrapper for scaling distributions to a given mean and variance
  • More probability distributions
  • Any other good suggestions made between now and then!

Package Development and Contributing

distr6 is released under the MIT licence with acknowledgements to the LGPL-3 licence of distr. Therefore any contributions to distr6 will also be accepted under the MIT licence. We welcome all bug reports, issues, questions and suggestions which can be raised here but please read through our contributing guidelines for details including our code of conduct.

Acknowledgements

distr6 is the result of a collaboration between many people, universities and institutions across the world, without whom the speed and performance of the package would not be up to the standard it is. Firstly we acknowledge all the work of Prof. Dr. Peter Ruckdeschel and Prof. Dr. Matthias Kohl in developing the original distr family of packages. Secondly their significant contributions to the planning and design of distr6 including the distribution and probability family class structures. A team of undergraduates at University College London implemented many of the probability distributions and are designing the plotting interface. The team consists of Shen Chen (@ShenSeanChen), Jordan Deenichin (@jdeenichin), Chengyang Gao (@garoc371), Chloe Zhaoyuan Gu (@gzy823), Yunjie He (@RoyaHe), Xiaowen Huang (@w090613), Shuhan Liu (@shliu99), Runlong Yu (@Edwinyrl), Chijing Zeng (@britneyzeng) and Qian Zhou (@yumizhou47). We also want to thank Prof. Dr. Bernd Bischl for discussions about design choices and useful features, particularly advice on the ParameterSet class. Finally University College London and The Alan Turing Institute for hosting workshops, meetings and providing coffee whenever needed.

Copy Link

Version

Install

install.packages('distr6')

Monthly Downloads

332

Version

1.1.0

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Raphael Sonabend

Last Published

August 27th, 2019

Functions in distr6 (1.1.0)

ArrayDistribution-deprecated

Product Array Distribution
Degenerate

Degenerate Distribution Class
Cauchy

Cauchy Distribution Class
Dirichlet

Dirichlet Distribution Class
ChiSquared

Chi-Squared Distribution Class
Convolution

Distribution Convolution Wrapper
Complex

Set of Complex Numbers
Bernoulli

Bernoulli Distribution Class
Beta

Beta Distribution Class
DistributionDecorator

Abstract DistributionDecorator Class
ExtendedReals

Set of Extended Reals
HuberizedDistribution

Distribution Huberization Wrapper
Gumbel

Gumbel Distribution Class
Exponential

Exponential Distribution Class
DistributionWrapper

Abstract DistributionWrapper Class
Empty

Empty Set
Geometric

Geometric Distribution Class
Empirical

Empirical Distribution Class
Binomial

Binomial Distribution Class
Gompertz

Gompertz Distribution Class
CoreStatistics

Core Statistical Methods for Distributions
Hypergeometric

Hypergeometric Distribution Class
Integers

Set of Integers
NegativeBinomial

Negative Binomial Distribution Class
Kernel

Abstract Kernel Class
Triangular

Triangular Distribution Class
FDistribution

'F' Distribution Class
Laplace

Laplace Distribution Class
Frechet

Frechet Distribution Class
Cosine

Cosine Kernel
Normal

Normal Distribution Class
Categorical

Categorical Distribution Class
LogisticKernel

Logistic Kernel
TriangularKernel

Triangular Kernel
Distribution

Generalised Distribution Object
DiscreteUniform

Discrete Uniform Distribution Class
Epanechnikov

Epanechnikov Kernel
Logarithmic

Logarithmic Distribution Class
ExoticStatistics

Exotic Statistical Methods for Distributions
cf

Characteristic Function
Multinomial

Multinomial Distribution Class
kurtosisType

Type of Kurtosis Accessor
listKernels

Lists Implemented Kernels
getSymbol.SetInterval

SetInterval Symbol Accessor
hazard

Hazard Function
decorators

Decorators Accessor
listSpecialSets

Lists Implemented R6 Special Sets
class.SetInterval

SetInterval Minimum Accessor
dimension.SetInterval

SetInterval Dimension Accessor
length.Interval

Length of Interval
FunctionImputation

Imputed Pdf/Cdf/Quantile/Rand Functions
Gamma

Gamma Distribution Class
symmetry

Symmetry Accessor
rand

Random Simulation Function
power.SetInterval

Symbolic Exponentiation for SetInterval
sup

Supremum Accessor
summary.Distribution

Distribution Summary
quantile.Distribution

Inverse Cumulative Distribution Function
pgf

Probability Generating Function
survivalPNorm

Survival Function P-Norm
MultivariateNormal

Multivariate Normal Distribution Class
NegIntegers

Set of Negative Integers
VectorDistribution

Vectorise Distributions
Rayleigh

Rayleigh Distribution Class
PosNaturals

Set of Positive Natural Numbers
Rationals

Set of Rationals
PosIntegers

Set of Positive Integers
Naturals

Set of Natural Numbers
cumHazard

Cumulative Hazard Function
UniformKernel

Uniform Kernel
Loglogistic

Log-Logistic Distribution Class
decorate

Decorate Distributions
Silverman

Silverman Kernel
cdfPNorm

Cumulative Distribution Function P-Norm
cdfAntiDeriv

Cumulative Distribution Function Anti-Derivative
Sigmoid

Sigmoid Kernel
genExp

Generalised Expectation of a Distribution
generalPNorm

Generalised P-Norm
testPositiveSkew

assert/check/test/PositiveSkew
Pareto

Pareto Distribution Class
Poisson

Poisson Distribution Class
testSymmetric

assert/check/test/Symmetric
PosReals

Set of Positive Reals
Quartic

Quartic Kernel
NegReals

Set of Negative Reals
InverseGamma

Inverse Gamma Distribution Class
NegRationals

Set of Negative Rationals
PosRationals

Set of Positive Rationals
Interval

R6 Generalised Class for Symbolic Intervals
ProductDistribution

Product Distribution
SpecialSet

Special Mathematical Sets
getParameterSupport

Parameter Support Accessor
Set

R6 Generalised Class for Symbolic Sets
StudentT

Student's T Distribution Class
inf.SetInterval

SetInterval Infimum Accessor
getParameterValue

Parameter Value Accessor
Logistic

Logistic Distribution Class
wrappedModels

Gets Internally Wrapped Models
Lognormal

Log-Normal Distribution Class
Tricube

Tricube Kernel
Triweight

Triweight Kernel
SetInterval

R6 Generalised Class for Symbolic Sets and Intervals
MixtureDistribution

Mixture Distribution Wrapper
length.Set

Length of Set
liesInSetInterval

Test if Data Lies in SetInterval.
Weibull

Weibull Distribution Class
Wald

Wald Distribution Class
max.SetInterval

SetInterval Maximum Accessor
median.Distribution

Median of a Distribution
skewType

Skewness Type
merge.ParameterSet

Combine ParameterSets
testContinuous

assert/check/test/Continuous
mean.Distribution

Distribution Mean
skewness

Distribution Skewness
testDiscrete

assert/check/test/Discrete
ParameterSet

Make an R6 Parameter Set for Distributions
SDistribution

Abstract Special Distribution Class
Uniform

Uniform Distribution Class
NormalKernel

Normal Kernel
Reals

Set of Reals
as.numeric.Interval

Coerces Interval to Numeric
TruncatedDistribution

Distribution Truncation Wrapper
complement.SetInterval

Symbolic Complement for SetInterval
as.ParameterSet

Coerce to a ParameterSet
as.data.table

Coerce ParameterSet to data.table
correlation

Distribution Correlation
cdf

Cumulative Distribution Function
exkurtosisType

Kurtosis Type
entropy

Distribution Entropy
distr6News

Show distr6 NEWS.md File
kthmoment

Kth Moment
dmax

Distribution Maximum Accessor
kurtosis

Distribution Kurtosis
listDecorators

Lists Implemented Distribution Decorators
huberize

Huberize a Distribution
mgf

Moment Generating Function
min.SetInterval

SetInterval Minimum Accessor
listDistributions

Lists Implemented Distributions
iqr

Distribution Interquartile Range
pdf

Probability Density/Mass Function
pdfPNorm

Probability Density Function P-Norm
prec

Precision of a Distribution
setSymbol

Unicode Symbol of Special Sets
ArrayDistribution

Deprecated distr6 Functions and Classes
inf

Infimum Accessor
liesInType

Test if Data Lies in Distribution Type
liesInSupport

Test if Data Lies in Distribution Support
print.ParameterSet

Print a ParameterSet
support

Support Accessor
sup.SetInterval

SetInterval Supremum Accessor
setOperation

Symbolic Operations for SetInterval
setParameterValue

Parameter Value Setter
stdev

Standard Deviation of a Distribution
simulateEmpiricalDistribution

Sample Empirical Distribution Without Replacement
survivalAntiDeriv

Survival Function Anti-Derivative
survival

Survival Function
distr6-package

distr6: Object Oriented Distributions in R
strprint

String Representation of Print
testNoSkew

assert/check/test/NoSkew
testDistributionList

assert/check/test/DistributionList
testDistribution

assert/check/test/Distribution
testMultivariate

assert/check/test/Multivariate
testMesokurtic

assert/check/test/Mesokurtic
testMixture

assert/check/test/Mixture
update.ParameterSet

Updates a ParameterSet
valueSupport

Value Support Accessor
dmin

Distribution Minimum Accessor
testPlatykurtic

assert/check/test/Platykurtic
type.SetInterval

SetInterval Type Accessor
union.SetInterval

Symbolic Unions for SetInterval
variance

Distribution Variance
testNegativeSkew

assert/check/test/NegativeSkew
variateForm

Variate Form Accessor
mode

Mode of a Distribution
listWrappers

Lists Implemented Distribution Wrappers
makeUniqueDistributions

De-Duplicate Distribution Names
elements

Set Elements Accessor
parameters

Parameters Accessor
product.SetInterval

Symbolic Cartesian Product for SetInterval
properties

Properties Accessor
squared2Norm

Squared Probability Density Function 2-Norm
testMatrixvariate

assert/check/test/Matrixvariate
testUnivariate

assert/check/test/Univariate
skewnessType

Type of Skewness Accessor
testLeptokurtic

assert/check/test/Leptokurtic
type

Type Accessor
traits

Traits Accessor
truncate

Truncate a Distribution
Arcsine

Arcsine Distribution Class