mix_mode: Mode estimation

Description

Mode estimation in univariate mixture distributions. The fixed-point algorithm of carreira-perpinan_mode-finding_2000;textualBayesMultiMode is used for Gaussian mixtures. The Modal EM algorithm of li_nonparametric_2007;textualBayesMultiMode is used for other continuous mixtures. A basic algorithm is used for discrete mixtures, see Cross2024;textualBayesMultiMode.

Usage

mix_mode(
  mixture,
  tol_mixp = 0,
  tol_x = 1e-06,
  tol_conv = 1e-08,
  type = "all",
  inside_range = TRUE
)

Value

A list of class mix_mode containing:

mode_estimates: estimates of the mixture modes.
algo: algorithm used for mode estimation.
dist: from mixture.
dist_type: type of mixture distribution, i.e. continuous or discrete.
pars: from mixture.
pdf_func: from mixture.
K: from mixture.
nb_var: from mixture.

Arguments

mixture: An object of class mixture generated with mixture().
tol_mixp: Components with a mixture proportion below tol_mixp are discarded when estimating modes; note that this does not apply to the biggest component so that it is not possible to discard all components; should be between 0 and 1; default is 0.
tol_x: (for continuous mixtures) Tolerance parameter for distance in-between modes; default is 1e-6; if two modes are closer than tol_x the first estimated mode is kept.
tol_conv: (for continuous mixtures) Tolerance parameter for convergence of the algorithm; default is 1e-8.
type: (for discrete mixtures) Type of modes, either "unique" or "all" (the latter includes flat modes); default is "all".
inside_range: Should modes outside of mixture$range be discarded? Default is TRUE. This sometimes occurs with very small components when K is large.

Details

This function finds modes in a univariate mixture defined as: $$p(.) = \sum_{k=1}^{K}\pi_k p_k(.),$$ where $p_k$ is a density or probability mass/density function.

Fixed-point algorithm Following carreira-perpinan_mode-finding_2000;textualBayesMultiMode, a mode $x$ is found by iterating the two steps: $$(i) \quad p(k|x^{(n)}) = \frac{\pi_k p_k(x^{(n)})}{p(x^{(n)})},$$ $$(ii) \quad x^{(n+1)} = f(x^{(n)}),$$ with $$f(x) = (\sum_k p(k|x) \sigma_k)^{-1}\sum_k p(k|x) \sigma_k \mu_k,$$ until convergence, that is, until $abs(x^{(n+1)}-x^{(n)})< \text{tol}_\text{conv}$, where $\text{tol}_\text{conv}$ is an argument with default value $1e-8$. Following Carreira-perpinan (2000), the algorithm is started at each component location. Separately, it is necessary to identify identical modes which diverge only up to a small value; this tolerance value can be controlled with the argument tol_x.

MEM algorithm Following li_nonparametric_2007;textualBayesMultiMode, a mode $x$ is found by iterating the two steps: $$(i) \quad p(k|x^{(n)}) = \frac{\pi_k p_k(x^{(n)})}{p(x^{(n)})},$$ $$(ii) \quad x^{(n+1)} = \text{argmax}_x \sum_k p(k|x) \text{log} p_k(x^{(n)}),$$ until convergence, that is, until $abs(x^{(n+1)}-x^{(n)})< \text{tol}_\text{conv}$, where $\text{tol}_\text{conv}$ is an argument with default value $1e-8$. The algorithm is started at each component location. Separately, it is necessary to identify identical modes which diverge only up to a small value. Modes which are closer then tol_x are merged.

Discrete method By definition, modes must satisfy either: $$p(y_{m}-1) < p(y_{m}) > p(y_{m}+1);$$ $$p(y_{m}-1) < p(y_{m}) = p(y_{m}+1) = \ldots = p(y_{m}+l-1) > p(y_{m}+l).$$

The algorithm evaluate each location point with these two conditions.

References

Cross2024BayesMultiMode

Examples

Run this code


# Example with a normal distribution ====================================
mu <- c(0, 5)
sigma <- c(1, 2)
p <- c(0.5, 0.5)

params <- c(eta = p, mu = mu, sigma = sigma)
mix <- mixture(params, dist = "normal", range = c(-5, 15))
modes <- mix_mode(mix)

# summary(modes)
# plot(modes)

# Example with a skew normal =============================================
xi <- c(0, 6)
omega <- c(1, 2)
alpha <- c(0, 0)
p <- c(0.8, 0.2)
params <- c(eta = p, xi = xi, omega = omega, alpha = alpha)
dist <- "skew_normal"

mix <- mixture(params, dist = dist, range = c(-5, 15))
modes <- mix_mode(mix)
# summary(modes)
# plot(modes)

# Example with an arbitrary continuous distribution ======================
xi <- c(0, 6)
omega <- c(1, 2)
alpha <- c(0, 0)
nu <- c(3, 100)
p <- c(0.8, 0.2)
params <- c(eta = p, mu = xi, sigma = omega, xi = alpha, nu = nu)

pdf_func <- function(x, pars) {
  sn::dst(x, pars["mu"], pars["sigma"], pars["xi"], pars["nu"])
}

mix <- mixture(params,
  pdf_func = pdf_func,
  dist_type = "continuous", loc = "mu", range = c(-5, 15)
)
modes <- mix_mode(mix)

# summary(modes)
# plot(modes, from = -4, to = 4)

# Example with a poisson distribution ====================================
lambda <- c(0.1, 10)
p <- c(0.5, 0.5)
params <- c(eta = p, lambda = lambda)
dist <- "poisson"


mix <- mixture(params, range = c(0, 50), dist = dist)

modes <- mix_mode(mix)

# summary(modes)
# plot(modes)

# Example with an arbitrary discrete distribution =======================
mu <- c(20, 5)
size <- c(20, 0.5)
p <- c(0.5, 0.5)
params <- c(eta = p, mu = mu, size = size)


pmf_func <- function(x, pars) {
  dnbinom(x, mu = pars["mu"], size = pars["size"])
}

mix <- mixture(params,
  range = c(0, 50),
  pdf_func = pmf_func, dist_type = "discrete"
)
modes <- mix_mode(mix)

# summary(modes)
# plot(modes)

Run the code above in your browser using DataLab