Learn R Programming

EnvStats (version 2.1.1)

Distribution.df: Data Frame Summarizing Available Probability Distributions and Estimation Methods

Description

Data frame summarizing information about available probability distributions in R and the EnvStats package, and which distributions have associated functions for estimating distribution parameters.

Usage

Distribution.df

Arguments

Format

A data frame with 35 rows corresponding to 35 different available probability distributions, and 25 columns containing information associated with these probability distributions.
Name
a character vector containing the name of the probability distribution (see the column labeled Name in the table below).
Type
a character vector indicating the type of distribution (see the column labeled Type in the table below). Possible values are "Finite Discrete", "Discrete", "Continuous", and "Mixed".
Support.Min
a character vector indicating the minimum value the random variable can assume (see the column labeled Range in the table below). The reason this is a character vector instead of a numeric vector is because some distributions have a lower bound that depends on the value of a distribution parameter. For example, the minimum value for a Uniform distribution is given by the value of the parameter min.
Support.Max
a character vector indicating the maximum value the random variable can assume (see the column labeled Range in the table below). The reason this is a character vector instead of a numeric vector is because some distributions have an upper bound that depends on the value of a distribution parameter. For example, the maximum value for a Uniform distribution is given by the value of the parameter max.
Estimation.Method(s)
a character vector indicating the names of the methods available to estimate the distribution parameter(s) (see the column labeled Estimation Method(s) in the table below). Possible values include "mle" (maximum likelihood), "mme" (method of moments), "mmue" (method of moments based on the unbiased estimate of variance), "mvue" (minimum variance unbiased), "qmle" (quasi-mle), etc., or some combination of these. In cases where an estimator is more than one kind, a slash (/) is used to denote all methods covered by the single estimator. For example, for the Binomial distribution, the sample proportion is the maximum likelihood, method of moments, and minimum variance unbiased estimator, so this method is denoted as "mle/mme/mvue". See the help files for the specific function listed under Estimating Distribution Parameters for an explanation of each of these estimation methods.
Quantile.Estimation.Method(s)
a character vector indicating the names of the methods available to estimate the distribution quantiles. For many distributions, these are the same as Estimation.Method(s). See the help files for the specific function listed under Estimating Distribution Quantiles for an explanation of each of these estimation methods.
Prediction.Interval.Method(s)
a character vector indicating the names of the methods available to create prediction intervals. See the help files for the specific function listed under Prediction Intervals for an explanation of each of these estimation methods.
Singly.Censored.Estimation.Method(s)
a character vector indicating the names of the methods available to estimate the distribution parameter(s) for Type I singly-censored data. See the help files for the specific function listed under Estimating Distribution Parameters in the help file for Censored Data for an explanation of each of these estimation methods.
Multiply.Censored.Estimation.Method(s)
a character vector indicating the names of the methods available to estimate the distribution parameter(s) for Type I multiply-censored data. See the help files for the specific function listed under Estimating Distribution Parameters in the help file for Censored Data for an explanation of each of these estimation methods.
Number.parameters
a numeric vector indicating the number of parameters associated with the distribution (see the column labeled Parameters in the table below).
Parameter.1
the columns labeled Parameter.1, Parameter.2, ..., Parameter.5 are character vectors containing the names of the distribution parameters (see the column labeled Parameters in the table below). If a distribution has $n$ parameters and $n < 5$, then the columns labeled Parameter.n+1, ..., Parameter.5 are empty. For example, the Normal distribution has only two parameters associated with it (mean and sd), so the fields in Parameter.3, Parameter.4, and Parameter.5 are empty.
Parameter.2
see Parameter.1
Parameter.3
see Parameter.1
Parameter.4
see Parameter.1
Parameter.5
see Parameter.1
Parameter.1.Min
the columns labeled Parameter.1.Min, Parameter.2.Min, ..., Parameter.5.Min are character vectors containing the minimum values that can be assumed by the distribution parameters (see the column labeled Parameter Range(s) in the table below). The reason these are character vectors instead of numeric vectors is because some parameters have a lower bound of 0 but must be strictly bigger than 0 (e.g., the parameter sd for the Normal distribution), in which case the lower bound is .Machine$double.eps, which may vary from machine to machine. Also, some parameters have a lower bound that depends on the value of another parameter. For example, the parameter max for a Uniform distribution is bounded below by the value of the parameter min. If a distribution has $n$ parameters and $n < 5$, then the columns labeled Parameter.n+1.Min, ..., Parameter.5.Min have the missing value code (NA). For example, the Normal distribution has only two parameters associated with it (mean and sd) so the fields in Parameter.3.Min, Parameter.4.Min, and Parameter.5.Min have NAs in them.
Parameter.2.Min
see Parameter.1.Min
Parameter.3.Min
see Parameter.1.Min
Parameter.4.Min
see Parameter.1.Min
Parameter.5.Min
see Parameter.1.Min
Parameter.1.Max
the columns labeled Parameter.1.Max, Parameter.2.Max, ..., Parameter.5.Max are character vectors containing the maximum values that can be assumed by the distribution parameters (see the column labeled Parameter Range(s) in the table below). The reason these are character vectors instead of numeric vectors is because some parameters have an upper bound that depends on the value of another parameter. For example, the parameter min for a Uniform distribution is bounded above by the value of the parameter max. If a distribution has $n$ parameters and $n < 5$, then the columns labeled Parameter.n+1.Max, ..., Parameter.5.Max have the missing value code (NA). For example, the Normal distribution has only two parameters associated with it (mean and sd) so the fields in Parameter.3.Max, Parameter.4.Max, and Parameter.5.Max have NAs in them.
Parameter.2.Max
see Parameter.1.Max
Parameter.3.Max
see Parameter.1.Max
Parameter.4.Max
see Parameter.1.Max
Parameter.5.Max
see Parameter.1.Max

Source

The EnvStats package.

Details

The table below summarizes the probability distributions available in R and EnvStats. For each distribution, there are four associated functions for computing density values, percentiles, quantiles, and random numbers. The form of the names of these functions are dabb, pabb, qabb, and rabb, where abb is the abbreviated name of the distribution (see table below). These functions are described in the help file with the name of the distribution (see the first column of the table below). For example, the help file for Beta describes the behavior of dbeta, pbeta, qbeta, and rbeta.

For most distributions, there is also an associated function for estimating the distribution parameters, and the form of the names of these functions is eabb, where abb is the abbreviated name of the distribution (see table below). All of these functions are listed in the help file Estimating Distribution Parameters. For example, the function ebeta estimates the shape parameters of a Beta distribution based on a random sample of observations from this distribution.

For some distributions, there are functions to estimate distribution parameters based on Type I censored data. The form of the names of these functions is eabbSinglyCensored for singly censored data and eabbMultiplyCensored for multiply censored data. All of these functions are listed under the heading Estimating Distribution Parameters in the help file Censored Data.

Table 1a. Available Distributions: Name, Abbreviation, Type, and Range

Name Abbreviation Type
Range Beta beta
Continuous $[0, 1]$
Binomial binom Finite
$[0, size]$
Discrete (integer)
Cauchy cauchy Continuous
$(-\infty, \infty)$
Chi
chi Continuous $[0, \infty)$
Chi-square chisq
Continuous $[0, \infty)$
Exponential exp Continuous
$[0, \infty)$
Extreme
evd Continuous $(-\infty, \infty)$
Value
F
f Continuous $[0, \infty)$
Gamma gamma
Continuous $[0, \infty)$
Gamma gammaAlt Continuous
$[0, \infty)$ (Alternative)
Generalized gevd Continuous
$(-\infty, \infty)$ Extreme
for $shape = 0$ Value
$(-\infty, location + \frac{scale}{shape}]$
for $shape > 0$
$[location + \frac{scale}{shape}, \infty)$
for $shape < 0$
Geometric geom Discrete
$[0, \infty)$
(integer)
Hypergeometric hyper Finite
$[0, min(k,m)]$
Discrete (integer)
Logistic logis Continuous
$(-\infty, \infty)$
Lognormal
lnorm Continuous $[0, \infty)$
Lognormal lnormAlt
Continuous $[0, \infty)$ (Alternative)
Lognormal lnormMix
Continuous $[0, \infty)$ Mixture
Lognormal lnormMixAlt
Continuous $[0, \infty)$ Mixture
(Alternative)
Three-
lnorm3 Continuous $[threshold, \infty)$
Parameter
Lognormal
Truncated lnormTrunc Continuous
$[min, max]$ Lognormal
Truncated lnormTruncAlt Continuous
$[min, max]$ Lognormal
(Alternative)
Negative nbinom
Discrete $[0, \infty)$ Binomial
(integer)
Normal norm
Continuous $(-\infty, \infty)$
Normal normMix Continuous
$(-\infty, \infty)$ Mixture
Truncated normTrunc Continuous
$[min, max]$ Normal
Pareto pareto Continuous
$[location, \infty)$
Poisson
pois Discrete $[0, \infty)$
(integer)
Student's t
t Continuous $(-\infty, \infty)$
Triangular tri
Continuous $[min, max]$
Uniform unif Continuous
$[min, max]$
Weibull
weibull Continuous $[0, \infty)$
Wilcoxon wilcox
Finite $[0, m n]$ Rank Sum
Discrete (integer)
Zero-Modified zmlnorm
Mixed $[0, \infty)$ Lognormal
(Delta)
Zero-Modified
zmlnormAlt Mixed $[0, \infty)$
Lognormal
(Delta)
(Alternative)
Zero-Modified zmnorm
Mixed $(-\infty, \infty)$ Normal

Table 1b. Available Distributions: Name, Parameters, Parameter Default Values, Parameter Ranges, Estimation Method(s)

Default Parameter
Estimation Name Parameter(s) Value(s)
Range(s) Method(s) Beta shape1
$(0, \infty)$ mle, mme, mmue
shape2 $(0, \infty)$
ncp 0 $(0, \infty)$
Binomial size
$[0, \infty)$ mle/mme/mvue
prob $[0, 1]$
Cauchy location 0
$(-\infty, \infty)$ scale
1 $(0, \infty)$
Chi df $(0, \infty)$
Chi-square df
$(0, \infty)$
ncp 0 $(-\infty, \infty)$
Exponential rate 1
$(0, \infty)$ mle/mme
Extreme
location 0 $ (-\infty, \infty)$ mle, mme, mmue, pwme
Value scale 1 $(0, \infty)$
F df1
$(0, \infty)$
df2 $(0, \infty)$
ncp 0 $(0, \infty)$
Gamma shape
$(0, \infty)$ mle, bcmle, mme, mmue
scale 1 $(0, \infty)$
Gamma mean
$(0, \infty)$ mle, bcmle, mme, mmue (Alternative) cv
1 $(0, \infty)$
Generalized location 0 $(-\infty, \infty)$
mle, pwme, tsoe Extreme scale 1
$(0, \infty)$ Value shape
0 $(-\infty, \infty)$
Geometric prob $(0, 1)$
mle/mme, mvue
Hypergeometric m
$[0, \infty)$ mle, mvue
n $[0, \infty)$
k $[1, m+n]$
Logistic location
0 $(-\infty, \infty)$ mle, mme, mmue
scale 1 $(0, \infty)$
Lognormal meanlog 0
$(-\infty, \infty)$ mle/mme, mvue sdlog
1 $(0, \infty)$
Lognormal mean exp(1/2) $(0, \infty)$
mle, mme, mmue, (Alternative) cv sqrt(exp(1)-1)
$(0, \infty)$ mvue, qmle
Lognormal
meanlog1 0 $(-\infty, \infty)$
Mixture sdlog1 1 $(0, \infty)$
meanlog2 0
$(-\infty, \infty)$ sdlog2
1 $(0, \infty)$
p.mix 0.5 $[0, 1]$
Lognormal mean1 exp(1/2)
$(0, \infty)$ Mixture cv1
sqrt(exp(1)-1) $(0, \infty)$ (Alternative)
mean2 exp(1/2) $(0, \infty)$
cv2 sqrt(exp(1)-1) $(0, \infty)$
p.mix 0.5
$[0, 1]$
Three-
meanlog 0 $(-\infty, \infty)$ lmle, mme,
Parameter sdlog 1 $(0, \infty)$
mmue, mmme, Lognormal threshold 0
$(-\infty, \infty)$ royston.skew,
zero.skew
Truncated meanlog 0 $(-\infty, \infty)$
Lognormal sdlog 1
$(0, \infty)$ min
0 $[0, max)$
max Inf $(min, \infty)$
Truncated mean exp(1/2)
$(0, \infty)$ Lognormal cv
sqrt(exp(1)-1) $(0, \infty)$ (Alternative)
min 0 $[0, max)$
max Inf $(min, \infty)$
Negative size
$[1, \infty)$ mle/mme, mvue Binomial
prob $(0, 1]$
mu $(0, \infty)$
Normal mean
0 $(-\infty, \infty)$ mle/mme, mvue
sd 1 $(0, \infty)$
Normal mean1 0
$(-\infty, \infty)$ Mixture sd1
1 $(0, \infty)$
mean2 0 $(-\infty, \infty)$
sd2 1 $(0, \infty)$
p.mix 0.5
$[0, 1]$
Truncated
mean 0 $(-\infty, \infty)$
Normal sd 1 $(0, \infty)$
min -Inf
$(-\infty, max)$ max
Inf $(min, \infty)$
Pareto location $(0, \infty)$
lse, mle shape 1
$(0, \infty)$
Poisson
lambda $(0, \infty)$ mle/mme/mvue
Student's t df
$(0, \infty)$ ncp
0 $(-\infty, \infty)$
Triangular min 0 $(-\infty, max)$
max 1
$(min, \infty)$ mode
0.5 $(min, max)$
Uniform min 0 $(-\infty, max)$
mle, mme, mmue max 1
$(min, \infty)$
Weibull
shape $(0, \infty)$ mle, mme, mmue
scale 1 $(0, \infty)$
Wilcoxon m
$[1, \infty)$ Rank Sum
n $[1, \infty)$
Zero-Modified meanlog 0
$(-\infty, \infty)$ mvue Lognormal sdlog
1 $(0, \infty)$ (Delta)
p.zero 0.5 $[0, 1]$
Zero-Modified mean exp(1/2)
$(0, \infty)$ mvue Lognormal cv
sqrt(exp(1)-1) $(0, \infty)$ (Delta)
p.zero 0.5 $[0, 1]$
(Alternative)
Zero-Modified mean 0
$(-\infty, \infty)$ mvue Normal sd
1 $(0, \infty)$
p.zero 0.5 $[0, 1]$

References

Millard, S.P. (2013). EnvStats: An R Package for Environmental Statistics. Springer, New York.