
This function is used to perform the maximum likelihood estimation for a variety of finite mixture models for both raw and binned data by using the EM algorithm, together with Newton-Raphson algorithm or bisection method when necessary.
mixfit(
x,
ncomp = NULL,
family = c("normal", "weibull", "gamma", "lnorm"),
pi = NULL,
mu = NULL,
sd = NULL,
ev = FALSE,
mstep.method = c("bisection", "newton"),
init.method = c("kmeans", "hclust"),
tol = 1e-06,
max_iter = 500
)
the function mixfit
return an object of class mixfitEM
, which contains a list of
different number of items when fitting different mixture models. The common items include
a numeric vector representing the estimated proportion of each component
a numeric vector representing the estimated mean of each component
a numeric vector representing the estimated standard deviation of each component
a positive integer recording the number of EM iteration performed
the loglikelihood of the estimated mixture model for the data x
the value of AIC of the estimated model for the data x
the value of BIC of the estimated model for the data x
the data x
the probability that x
belongs to each component
the family the mixture model belongs to
For the Weibull mixture model, the following extra items are returned.
a numeric vector representing the estimated shape parameter of each component
a numeric vector representing the estimated scale parameter of each component
For the Gamma mixture model, the following extra items are returned.
a numeric vector representing the estimated shape parameter of each component
a numeric vector representing the estimated rate parameter of each component
For the lognormal mixture model, the following extra items are returned.
a numeric vector representing the estimated logarithm mean of each component
a numeric vector representing the estimated logarithm standard deviation of each component
a numeric vector for the raw data or a three-column matrix for the binned data
a positive integer specifying the number of components of the mixture model
a character string specifying the family of the mixture model. It can only be
one element from normal
, weibull
, gamma
or lnorm
.
a vector of the initial value for the proportion
a vector of the initial value for the mean
a vector of the initial value for the standard deviation
a logical value controlling whether each component has the same variance when
fitting normal mixture models. It is ignored when fitting other mixture models. The default is FALSE
.
a character string specifying the method used in M-step of the EM algorithm
when fitting weibull or gamma mixture models. It can be either bisection
or newton
.
The default is bisection
.
a character string specifying the method used for providing initial values
for the parameters for EM algorithm. It can be one of kmeans
or hclust
. The default is
kmeans
the tolerance for the stopping rule of EM algorithm. It is the value to stop EM algorithm when the two
consecutive iterations produces loglikelihood with difference less than tol
. The default value is 1e-6.
the maximum number of iterations for the EM algorithm (default 500).
The function mixfit
is the core function in this package. It is used to perform
the maximum likelihood estimation for finite mixture models from the families of normal,
weibull, gamma or lognormal by using the EM algorithm. When the family is weibull
or gamma
, the M-step of the EM algorithm has no closed-form solution and we can
use Newton algorithm by specifying method = "newton"
or use bisection method by
specifying method = "bisection"
.
The initial values of the EM algorithm can be provided by specifying the proportion of each
component pi
, the mean of each component mu
and the standard deviation of
each component sd
. If one or more of these initial values are not provided, then
their values are estimated by using K-means clustering method or hierarchical clustering
method. If all of pi
, mu
, and sd
are not provided, then ncomp
should be provided so initial values are automatically
generated. For the normal mixture models, we can
control whether each component has the same variance or not.
plot.mixfitEM
, density.mixfitEM
,
select
, bs.test
## fitting the normal mixture models
set.seed(103)
x <- rmixnormal(200, c(0.3, 0.7), c(2, 5), c(1, 1))
data <- bin(x, seq(-1, 8, 0.25))
fit1 <- mixfit(x, ncomp = 2) # raw data
fit2 <- mixfit(data, ncomp = 2) # binned data
fit3 <- mixfit(x, pi = c(0.5, 0.5), mu = c(1, 4), sd = c(1, 1)) # providing the initial values
fit4 <- mixfit(x, ncomp = 2, ev = TRUE) # setting the same variance
## (not run) fitting the weibull mixture models
## x <- rmixweibull(200, c(0.3, 0.7), c(2, 5), c(1, 1))
## data <- bin(x, seq(0, 8, 0.25))
## fit5 <- mixfit(x, ncomp = 2, family = "weibull") # raw data
## fit6 <- mixfit(data, ncomp = 2, family = "weibull") # binned data
## (not run) fitting the Gamma mixture models
## x <- rmixgamma(200, c(0.3, 0.7), c(2, 5), c(1, 1))
## data <- bin(x, seq(0, 8, 0.25))
## fit7 <- mixfit(x, ncomp = 2, family = "gamma") # raw data
## fit8 <- mixfit(data, ncomp = 2, family = "gamma") # binned data
## (not run) fitting the lognormal mixture models
## x <- rmixlnorm(200, c(0.3, 0.7), c(2, 5), c(1, 1))
## data <- bin(x, seq(0, 8, 0.25))
## fit9 <- mixfit(x, ncomp = 2, family = "lnorm") # raw data
## fit10 <- mixfit(data, ncomp = 2, family = "lnorm") # binned data
Run the code above in your browser using DataLab