fitmixturegrouped: Estimating parameters of the well-known mixture models fitted to the grouped data

Description

Estimates parameters of the gamma, log-normal, and Weibull mixture models fitted to the grouped data using the expectation maximization (EM) algorithm. General form for the cdf of a statistical mixture model is given by $$F(x,{\Theta}) = \sum_{k=1}^{K}\omega_k F_k(x,\theta_k),$$ where $\Theta=(\theta_1,\dots,\theta_K)^T$, is the whole parameter vector, $\theta_k$ for $k=1,\dots,K$ is the parameter space of the $j$-th component, i.e. $\theta_k=(\alpha_k,\beta_k)^{T}$, $F_j(.,\theta_j)$ is the cdf of the $k$-th component, and known constant $K$ is the number of components. Parameters $\alpha$ and $\beta$ are the shape and scale parameters. The constants $\omega_k$s sum to one, i.e. $\sum_{k=1}^{K}\omega_k=1$. The families considered for the cdf $F$ include Gamma, Log-normal, and Weibull. If a sample of $n$ independent observations each follows a distribution with cdf $F$ have been divided into $m$ separate groups of the form $(r_{i-1},r_i]$, for $i=1,\dots,m$. So, the likelihood function of the observed data is given by $$ L(\Theta|f_1,\dots,f_m)=\frac{n!}{f_{1}!f_{2}!\dots f_{m}!}\prod_{i=1}^{m}\Bigl[\frac{F_i(\Theta)}{F(\Theta)}\Bigr]^{f_i},$$ where $$F_i(\Theta)=\sum_{k=1}^{K}\omega_k\int_{r_{i-1}}^{r_i}f(x|\theta_k)dx,$$ $$F(\Theta)=\sum_{k=1}^{K}\omega_kf(x|\theta_k)dx,$$ in which $f(x|\theta_k)$ denotes the pdf of the $j$-th component. Using the the EM algorithm proposed by Dempster et al. (1977), we can solve $ \partial L(\Theta|f_1,\dots,f_m)/{\partial \Theta}=0$ by introducing two new missing variables.

Usage

fitmixturegrouped(family, r, f, K, initial="FALSE", starts)

Arguments

family

Name of the family including: "gamma", "log-normal", "skew-normal", and "weibull".

A numeric vector of length $m+1$. The first element of $r$ is lower bound of the first group and other $m$ elements are upper bound of the $m$ groups. We note that upper bound of the $(i-1)$-th group is the lower bound of the $i$-th group, for $i=2,\dots,m$. The lower bound of the first group and upper bound of the $m$-th group are chosen arbitrarily. If raw data are available, the smallest and largest observations are chosen for lower bound of the first group and upper bound of the $m$-th group, respectively.

A numeric vector of length $m$ containing the group's frequency.

Number of components.

initial

The sequence of initial values including $\omega_1,\dots,\omega_K,\alpha_1,\dots,\alpha_K,\beta_1,\dots,\beta_K$. For skew normal case the vector of initial values of skewness parameters will be added. By default the initial values automatically is determind by k-means method of clustering.

starts

If "initial=TRUE", then sequence of the initial values must be given.

Value

The output has two parts, The first part includes vector of estimated weight, shape, and scale parameters.
A sequence of goodness-of-fit measures consist of Akaike Information Criterion (AIC), Consistent Akaike Information Criterion (CAIC), Bayesian Information Criterion (BIC), Hannan-Quinn information criterion (HQIC), Anderson-Darling (AD), Cram\'eer-von Misses (CVM), Kolmogorov-Smirnov (KS), and log-likelihood (log-likelihood) statistics.

Details

Identifiability of the mixture models supposed to be held. For skew-normal mixture model the parameter vector of $k$-th component gets the form $\theta_k=(\alpha_k,\beta_k,\lambda_k)^{T}$ where $\alpha_k,\beta_k,$ and $\lambda_k$ denote the location, scale, and skewness parameters, respectively.

References

G. J. McLachlan and P. N. Jones, 1988. Fitting mixture models to grouped and truncated data via the EM algorithm, Biometrics, 44, 571-578

Examples

Run this code

# NOT RUN {
n<-50
K<-2
m<-10
weight<-c(0.3,0.7)
alpha<-c(1,2)
beta<-c(2,1)
param<-c(weight,alpha,beta)
x<-rmixture(n, "weibull", K, param)
r<-seq(min(x),max(x),length=m+1)
D<-data.frame(table(cut(x,r,labels=NULL,include.lowest=TRUE,right=FALSE,dig.lab=4)))
f<-D$Freq
fitmixturegrouped("weibull",r,f,K,initial="FALSE")
# }

Run the code above in your browser using DataLab