mixmod_em: Mixture Model Estimation using EM-Algorithm

Description

This method uses the EM-Algorithm to estimate the parameters of a univariate mixture model. Until now, the mixture model can consist of k two-parametric Weibull distributions. If no mixture of k components can be estimated, the function is forced to stop and a message with instructions is given.

Usage

mixmod_em(x, event, post = NULL, distribution = "weibull",
  conf_level = 0.95, k = 2, method = "EM", n_iter = 100L,
  conv_limit = 1e-06, diff_loglik = 0.5)

Arguments

a numeric vector which consists of lifetime data. Lifetime data could be every characteristic influencing the reliability of a product, e.g. operating time (days/months in service), mileage (km, miles), load cycles.

event

a vector of binary data (0 or 1) indicating whether unit i is a right censored observation (= 0) or a failure (= 1).

post

a numeric matrix specifiying initial a-posteriori probabilities. If post is NULL (default) a-posteriori probabilities are assigned randomly using the Dirichlet distribution (rdirichlet from LearnBayes Package), which is the conjugate prior of a Multinomial distribution. This idea was taken from the blog post of Mr. Gelissen (linked under references).

distribution

supposed mixture model. Only "weibull" can be used. Other distributions have not been implemented yet.

conf_level

confidence level for the confidence intervals of the parameters of every component k. The default value is conf_level = 0.95.

integer of mixture components, default is 2.

method

default method is "EM". Other methods have not been implemented yet.

n_iter

integer defining the maximum number of iterations.

conv_limit

numeric value defining the convergence limit.

diff_loglik

numeric value defining the maximum difference between log-likelihood values, which seems permissible. The default value is 0.5. See Details for the usage of this argument.

Value

Returns a list where the length of the list depends on the number of k subgroups. The first k lists have the same information as provided by ml_estimation, but the values logL, aic and bic are the results of a log-likelihood function, which is weighted by a-posteriori probabilities. The last list summarizes further results of the EM-Algorithm and is therefore called em_results. It contains the following elements:

a_priori : A vector with estimated a-priori probabilities.
a_posteriori : A matrix with estimated a-posteriori probabilities.
groups : Numeric vector specifying the group membership of every observation.
logL : The value of the complete log-likelihood.
aic : Akaike Information Criterion.
bic : Bayesian Information Criterion.

Details

In mixmod_em the function mixture_em_cpp is called. The computed posterior probabilities are then used as weights inside function ml_estimation to model a weighted log-likelihood. This strategy enables the computation of confidence intervals for the parameters of the separated sub-distributions, since ml_estimation provides a variance-covariance matrix. Using this strategy, a potential problem that can occur is, that the value of the complete log-likelihood, computed by mixture_em_cpp, differs considerably from the complete log-likelihood after re-estimating parameters with ml_estimation. If so, the estimated quantities like prior and posterior probabilities, as well as the model parameters are not reliable anymore and the function is forced to stop with the message: "Parameter estimation was not successful!" But if the log-likelihood values are close to each other, the presence of the mixture is strengthened and a reasonable fit is provided. Thus, a check of the absolute differences in the log-likelihood values is made and the critical difference has to be specified in argument diff_loglik.

References

Doganaksoy, N.; Hahn, G.; Meeker, W. Q., Reliability Analysis by Failure Mode, Quality Progress, 35(6), 47-52, 2002
Blog posts by Stefan Gelissen: http://blogs2.datall-analyse.nl/2016/02/18/rcode_mixture_distribution_censored; last access on 19th January 2019

Examples

Run this code

# NOT RUN {
# Data is taken from given reference of Doganaksoy, Hahn and Meeker:
hours <- c(2, 28, 67, 119, 179, 236, 282, 317, 348, 387, 3, 31, 69, 135,
          191, 241, 284, 318, 348, 392, 5, 31, 76, 144, 203, 257, 286,
          320, 350, 412, 8, 52, 78, 157, 211, 261, 298, 327, 360, 446,
          13, 53, 104, 160, 221, 264, 303, 328, 369, 21, 64, 113, 168,
          226, 278, 314, 328, 377)
state <- c(1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1,
         1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0,
         1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         0, 1, 1, 1, 1, 1, 1)

mix_mod_em <- mixmod_em(x = hours,
                        event = state,
                        distribution = "weibull",
                        conf_level = 0.95,
                        k = 2,
                        method = "EM",
                        n_iter = 150)


# }

Run the code above in your browser using DataLab