mixfit: Finite Mixture Modeling for Raw Data and Binned Data

Description

This function is used to perform the maximum likelihood estimation for a variety of finite mixture models for both raw and binned data by using the EM algorithm, together with Newton-Raphson algorithm or bisection method when necessary.

Usage

mixfit(
  x,
  ncomp = NULL,
  family = c("normal", "weibull", "gamma", "lnorm"),
  pi = NULL,
  mu = NULL,
  sd = NULL,
  ev = FALSE,
  mstep.method = c("bisection", "newton"),
  init.method = c("kmeans", "hclust"),
  tol = 1e-06,
  max_iter = 500
)

Value

the function mixfit return an object of class mixfitEM, which contains a list of different number of items when fitting different mixture models. The common items include

pi: a numeric vector representing the estimated proportion of each component
mu: a numeric vector representing the estimated mean of each component
sd: a numeric vector representing the estimated standard deviation of each component
iter: a positive integer recording the number of EM iteration performed
loglik: the loglikelihood of the estimated mixture model for the data x
aic: the value of AIC of the estimated model for the data x
bic: the value of BIC of the estimated model for the data x
data: the data x
comp.prob: the probability that x belongs to each component
family: the family the mixture model belongs to

For the Weibull mixture model, the following extra items are returned.

k: a numeric vector representing the estimated shape parameter of each component
lambda: a numeric vector representing the estimated scale parameter of each component

For the Gamma mixture model, the following extra items are returned.

alpha: a numeric vector representing the estimated shape parameter of each component
lambda: a numeric vector representing the estimated rate parameter of each component

For the lognormal mixture model, the following extra items are returned.

mulog: a numeric vector representing the estimated logarithm mean of each component
sdlog: a numeric vector representing the estimated logarithm standard deviation of each component

Arguments

x: a numeric vector for the raw data or a three-column matrix for the binned data
ncomp: a positive integer specifying the number of components of the mixture model
family: a character string specifying the family of the mixture model. It can only be one element from normal, weibull, gamma or lnorm.
pi: a vector of the initial value for the proportion
mu: a vector of the initial value for the mean
sd: a vector of the initial value for the standard deviation
ev: a logical value controlling whether each component has the same variance when fitting normal mixture models. It is ignored when fitting other mixture models. The default is FALSE.
mstep.method: a character string specifying the method used in M-step of the EM algorithm when fitting weibull or gamma mixture models. It can be either bisection or newton. The default is bisection.
init.method: a character string specifying the method used for providing initial values for the parameters for EM algorithm. It can be one of kmeans or hclust. The default is kmeans
tol: the tolerance for the stopping rule of EM algorithm. It is the value to stop EM algorithm when the two consecutive iterations produces loglikelihood with difference less than tol. The default value is 1e-6.
max_iter: the maximum number of iterations for the EM algorithm (default 500).

Details

The function mixfit is the core function in this package. It is used to perform the maximum likelihood estimation for finite mixture models from the families of normal, weibull, gamma or lognormal by using the EM algorithm. When the family is weibull or gamma, the M-step of the EM algorithm has no closed-form solution and we can use Newton algorithm by specifying method = "newton" or use bisection method by specifying method = "bisection".

The initial values of the EM algorithm can be provided by specifying the proportion of each component pi, the mean of each component mu and the standard deviation of each component sd. If one or more of these initial values are not provided, then their values are estimated by using K-means clustering method or hierarchical clustering method. If all of pi, mu, and sd are not provided, then ncomp should be provided so initial values are automatically generated. For the normal mixture models, we can control whether each component has the same variance or not.

Examples

Run this code

## fitting the normal mixture models
set.seed(103)
x <- rmixnormal(200, c(0.3, 0.7), c(2, 5), c(1, 1))
data <- bin(x, seq(-1, 8, 0.25))
fit1 <- mixfit(x, ncomp = 2)  # raw data
fit2 <- mixfit(data, ncomp = 2)  # binned data
fit3 <- mixfit(x, pi = c(0.5, 0.5), mu = c(1, 4), sd = c(1, 1))  # providing the initial values
fit4 <- mixfit(x, ncomp = 2, ev = TRUE)  # setting the same variance

## (not run) fitting the weibull mixture models
## x <- rmixweibull(200, c(0.3, 0.7), c(2, 5), c(1, 1))
## data <- bin(x, seq(0, 8, 0.25))
## fit5 <- mixfit(x, ncomp = 2, family = "weibull")  # raw data
## fit6 <- mixfit(data, ncomp = 2, family = "weibull")  # binned data

## (not run) fitting the Gamma mixture models
## x <- rmixgamma(200, c(0.3, 0.7), c(2, 5), c(1, 1))
## data <- bin(x, seq(0, 8, 0.25))
## fit7 <- mixfit(x, ncomp = 2, family = "gamma")  # raw data
## fit8 <- mixfit(data, ncomp = 2, family = "gamma")  # binned data

## (not run) fitting the lognormal mixture models
## x <- rmixlnorm(200, c(0.3, 0.7), c(2, 5), c(1, 1))
## data <- bin(x, seq(0, 8, 0.25))
## fit9 <- mixfit(x, ncomp = 2, family = "lnorm")  # raw data
## fit10 <- mixfit(data, ncomp = 2, family = "lnorm")  # binned data

Run the code above in your browser using DataLab

State of Data and AI Literacy Report 2025