fitmixture: Estimating parameters of the well-known mixture models

Description

Estimates parameters of the mixture model using the expectation maximization (EM) algorithm. General form for the cdf of a statistical mixture model is given by $$F(x,{\Theta}) = \sum_{j=1}^{K}\omega_j F_j(x,\theta_j),$$ where $\Theta=(\theta_1,\dots,\theta_K)^T$, is the whole parameter vector, $\theta_j$ for $j=1,\dots,K$ is the parameter space of the $j$-th component, i.e. $\theta_j=(\alpha_j,\beta_j)^{T}$, $F_j(.,\theta_j)$ is the cdf of the $j$-th component, and known constant $K$ is the number of components. Parameters $\alpha$ and $\beta$ are the shape and scale parameters or both are the shape parameters. In the latter case, the parameters $\alpha$ and $\beta$ are called the first and second shape parameters, respectively. We note that the constants $\omega_j$s sum to one, i.e. $\sum_{j=1}^{K}\omega_j=1$. The families considered for the cdf $F$ include Birnbaum-Saunders, Burr type XII, Chen, F, Frechet, Gamma, Gompertz, Log-normal, Log-logistic, Lomax, skew-normal, and Weibull.

Usage

fitmixture(x, family, K, initial="FALSE", starts)

Arguments

Vector of observations.

family

Name of the family including: "birnbaum-saunders", "burrxii", "chen", "f", "Frechet", "gamma", "gompetrz", "log-normal", "log-logistic", "lomax", "skew-normal", and "weibull".

Number of components.

initial

The sequence of initial values including $\omega_1,\dots,\omega_K,\alpha_1,\dots,\alpha_K,\beta_1,\dots,\beta_K$. For skew normal case the vector of initial values of skewness parameters will be added. By default the initial values automatically is determind by k-means method of clustering.

starts

If "initial=TRUE", then sequence of the initial values must be given.

Value

The output has three parts, The first part includes vector of estimated weight, shape, and scale parameters.
The second part involves a sequence of goodness-of-fit measures consist of Akaike Information Criterion (AIC), Consistent Akaike Information Criterion (CAIC), Bayesian Information Criterion (BIC), Hannan-Quinn information criterion (HQIC), Anderson-Darling (AD), Cram\'eer-von Misses (CVM), Kolmogorov-Smirnov (KS), and log-likelihood (log-likelihood) statistics.
The last part of the output contains clustering vector.

Details

It is worth noting that identifiability of the mixture models supposed to be held. For skew-normal case we have $\theta_j=(\alpha_j,\beta_j,\lambda_j)^{T}$ in which $-\infty<\alpha_j<\infty$, $\beta_j>0$, and $-\infty<\lambda_j<\infty$, respectively, are the location, scale, and skewness parameters of the $j$-th component, see Azzalini (1985).

References

A. Azzalini, 1985. A class of distributions which includes the normal ones, Scandinavian Journal of Statistics, 12, 171-178.

A. P. Dempster, N. M. Laird, and D. B. Rubin, 1977. Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society Series B, 39, 1-38.

M. Teimouri, S. Rezakhah, and A. Mohammdpour, 2018. EM algorithm for symmetric stable mixture model, Communications in Statistics-Simulation and Computation, 47(2), 582-604.

Examples

Run this code

# NOT RUN {
n<-50
K<-2
m<-10
weight<-c(0.3,0.7)
alpha<-c(1,2)
beta<-c(2,1)
param<-c(weight,alpha,beta)
x<-rmixture(n, "weibull", K, param)
fitmixture(x,"weibull", K, initial="FALSE")
# }

Run the code above in your browser using DataLab