This method uses the EM-Algorithm to estimate the parameters of a univariate mixture model. Until now, the mixture model can consist of k two-parametric Weibull distributions. If no mixture of k components can be estimated, the function is forced to stop and a message with instructions is given.
mixmod_em(x, event, post = NULL, distribution = "weibull",
conf_level = 0.95, k = 2, method = "EM", n_iter = 100L,
conv_limit = 1e-06, diff_loglik = 0.5)
a numeric vector which consists of lifetime data. Lifetime data could be every characteristic influencing the reliability of a product, e.g. operating time (days/months in service), mileage (km, miles), load cycles.
a vector of binary data (0 or 1) indicating whether unit i is a right censored observation (= 0) or a failure (= 1).
a numeric matrix specifiying initial a-posteriori probabilities.
If post is NULL
(default) a-posteriori probabilities are assigned
randomly using the Dirichlet distribution (rdirichlet
from
LearnBayes Package), which is the conjugate prior of a Multinomial
distribution. This idea was taken from the blog post of Mr. Gelissen
(linked under references).
supposed mixture model. Only "weibull"
can be used.
Other distributions have not been implemented yet.
confidence level for the confidence intervals of the parameters
of every component k
. The default value is conf_level = 0.95
.
integer of mixture components, default is 2.
default method is "EM"
. Other methods have not been implemented
yet.
integer defining the maximum number of iterations.
numeric value defining the convergence limit.
numeric value defining the maximum difference between
log-likelihood values, which seems permissible. The default value is 0.5
.
See Details for the usage of this argument.
Returns a list where the length of the list depends on the number of
k subgroups. The first k
lists have the same information as provided by
ml_estimation, but the values logL
, aic
and bic
are
the results of a log-likelihood function, which is weighted by a-posteriori
probabilities. The last list summarizes further results of the EM-Algorithm and
is therefore called em_results
. It contains the following elements:
a_priori
: A vector with estimated a-priori probabilities.
a_posteriori
: A matrix with estimated a-posteriori probabilities.
groups
: Numeric vector specifying the group membership of every
observation.
logL
: The value of the complete log-likelihood.
aic
: Akaike Information Criterion.
bic
: Bayesian Information Criterion.
In mixmod_em
the function mixture_em_cpp
is called. The
computed posterior probabilities are then used as weights inside function
ml_estimation
to model a weighted log-likelihood. This strategy
enables the computation of confidence intervals for the parameters of the
separated sub-distributions, since ml_estimation
provides a variance-covariance
matrix. Using this strategy, a potential problem that can occur is,
that the value of the complete log-likelihood, computed by mixture_em_cpp
,
differs considerably from the complete log-likelihood after re-estimating
parameters with ml_estimation
. If so, the estimated quantities like
prior and posterior probabilities, as well as the model parameters are not
reliable anymore and the function is forced to stop with the message:
"Parameter estimation was not successful!"
But if the log-likelihood values are close to each other, the presence of the
mixture is strengthened and a reasonable fit is provided.
Thus, a check of the absolute differences in the log-likelihood values is made
and the critical difference has to be specified in argument diff_loglik
.
Doganaksoy, N.; Hahn, G.; Meeker, W. Q., Reliability Analysis by Failure Mode, Quality Progress, 35(6), 47-52, 2002
Blog posts by Stefan Gelissen: http://blogs2.datall-analyse.nl/2016/02/18/rcode_mixture_distribution_censored; last access on 19th January 2019
# NOT RUN {
# Data is taken from given reference of Doganaksoy, Hahn and Meeker:
hours <- c(2, 28, 67, 119, 179, 236, 282, 317, 348, 387, 3, 31, 69, 135,
191, 241, 284, 318, 348, 392, 5, 31, 76, 144, 203, 257, 286,
320, 350, 412, 8, 52, 78, 157, 211, 261, 298, 327, 360, 446,
13, 53, 104, 160, 221, 264, 303, 328, 369, 21, 64, 113, 168,
226, 278, 314, 328, 377)
state <- c(1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1,
1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0,
1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1,
0, 1, 1, 1, 1, 1, 1)
mix_mod_em <- mixmod_em(x = hours,
event = state,
distribution = "weibull",
conf_level = 0.95,
k = 2,
method = "EM",
n_iter = 150)
# }
Run the code above in your browser using DataLab