distr_est: MLE for Distribution Fitting

Description

Given a vector of values assumed to stem from independent and identically distributed (iid) random variables, fit a selection of distributions, from the normal distribution, the $t$-distribution, the generalized error distribution (GED), the average Laplace distribution (ALD), and their skewed variants, to the data using maximum-likelihood estimation (MLE).

Usage

distr_est(
  x,
  dist = c("norm", "std", "ged", "ald", "snorm", "sstd", "sged", "sald"),
  fix_mean = NULL,
  fix_sdev = NULL,
  Prange = c(1, 5)
)
norm_est(x, fix_mean = NULL, fix_sdev = NULL)
std_est(x, fix_mean = NULL, fix_sdev = NULL)
ged_est(x, fix_mean = NULL, fix_sdev = NULL)
ald_est(x, fix_mean = NULL, fix_sdev = NULL, Prange = c(1, 5))
snorm_est(x, fix_mean = NULL, fix_sdev = NULL)
sstd_est(x, fix_mean = NULL, fix_sdev = NULL)
sged_est(x, fix_mean = NULL, fix_sdev = NULL)
sald_est(x, fix_mean = NULL, fix_sdev = NULL, Prange = c(1, 5))

Value

Returns a list with the following elements.

Arguments

x: a numeric vector with the data.
dist: a character value that specifies the distribution to consider; available are a normal distribution ("norm"), a $t$-distribution ("std"), a GED ("ged"), an ALD ("ald"), and their skewed variants ("snorm", "sstd", "sged", "sald").
fix_mean: optional; for the default NULL, a location parameter representing the (unconditional) mean of the distribution is also being estimated; for any numerical value, however, the mean will be fixed to the corresponding value and therefore excluded from the estimation itself.
fix_sdev: optional; for the default NULL, a scale parameter representing the (unconditional) standard deviation of the distribution is also being estimated; for any numerical value, however, the standard deviation will be fixed to the corresponding value and therefore excluded from the estimation itself.
Prange: a two-element numeric vector, giving the boundaries of the search space for the shape parameter $P$ in an ALD or its skewed variant.

Details

Let $x$ be an individual observation. Let $\mu$ a real-valued location parameter, representing the unconditional mean of the distribution, and $\sigma$ a real-valued scale parameter, representing the unconditional standard deviation. Generally, let $\theta$ be a vector with all parameters of the underlying distribution. The likelihood of $x$ is given through $$L_x^{\text{norm}}(\theta)=\frac{\sigma^{-1}}{\sqrt{2\pi}}\exp\left(-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2\right)$$ for a normal distribution, $$L_x^{\text{std}}(\theta)=\frac{\sigma^{-1}\Gamma\left(\frac{\nu + 1}{2}\right)}{\Gamma\left(\frac{\nu}{2}\right)\sqrt{\pi(\nu-2)}}\left[1+\frac{1}{\nu - 2}\left(\frac{x-\mu}{\sigma}\right)^2\right]^{-\frac{\nu + 1}{2}}$$ for a $t$-distribution with $\nu$ as the degrees of freedom and $\Gamma$ as the gamma function, $$L_x^{\text{ged}}(\theta)=\frac{\sigma^{-1}\beta}{2}\sqrt{\frac{C_{\Gamma,3}}{C_{\Gamma,1}^3}} \exp\left\{-\left|\frac{x-\mu}{\sigma}\right|^{\beta}\left(\frac{C_{\Gamma,3}}{C_{\Gamma,1}}\right)^{\frac{\beta}{2}}\right\}$$ for a GED with $\beta$ as its real-valued shape and with $C_{\Gamma,i}=\Gamma\left(\frac{i}{\beta}\right)$, $i\in\left\{1,3\right\}$, and in $$L_x^{\text{ald}}(\theta)=\frac{\sigma^{-1}sB}{2}\exp\left(-s\left|\frac{x-\mu}{\sigma}\right|\right)\sum_{j=0}^{P}c_j\left(s\left|\frac{x-\mu}{\sigma}\right|\right)^j$$ for an ALD with $P$ as its discrete shape, where $s = \sqrt{2(P+1)}$, $$B=2^{-2P} {{2P}\choose{P}}, \hspace{4mm} P \geq 0,$$ and $$c_{j}=\frac{2(P-j + 1)}{j(2P-j+1)}c_{j-1}, \hspace{4mm} j =2,3,\dots,P,$$ with $c_0 = c_1 = 1$. The individual-observation likelihoods for the skewed variants are derived analogously from the idea by Fernandez and Steel (1998). The log-likelihoods to maximize over are then just the sum of the log-transformed likelihoods for each observation.

distr_est is a general purpose distribution fitting function, where the distribution can be selected through the argument dist. norm_est, std_est, ged_est, ald_est, snorm_est, sstd_est, sged_est, and sald_est are wrappers around distr_est in order to directly provide fitting functions for the different distributions available in this package.

References

Fernandez, C., & Steel, M. F. J. (1998). Bayesian Modeling of Fat Tails and Skewness. Journal of the American Statistical Association, 93(441), 359–371. DOI: 10.1080/01621459.1998.10474117.

Examples

Run this code

# Draw obs. from GED and implement standard deviation 1.2
# and mean 3.1
x <- rged_s(4000, shape = 1.5) * 1.2 + 3.1
# Fit GED
ged_est(x)
# Fit GED differently using distr_est()
distr_est(x, dist = "ged")
# Fit GED while fixing mean and standard deviation
ged_est(x, fix_mean = 3.1, fix_sdev = 1.2)
# Fit another distribution
sstd_est(x)

Run the code above in your browser using DataLab