REBMIX: REBMIX Algorithm for Univariate or Multivariate Finite Mixture Estimation

Description

Returns the REBMIX algorithm output for mixtures of conditionally independent normal, lognormal, Weibull, gamma, binomial, Poisson or Dirac component densities.

Usage

REBMIX(Dataset = NULL, Preprocessing = NULL, D = 0.025, cmax = 15, 
       Criterion = "AIC", Variables = NULL, pdf = NULL,
       Theta1 = NULL, Theta2 = NULL, K = NULL, ymin = NULL, 
       ymax = NULL, ar = 0.1, Restraints = "loose", ...)

Arguments

Dataset

a list of data frames of size $n \times d$ containing d-dimensional datasets. Each of the $d$ columns represents one random variable. Number of observations $n$ equals the number of rows in the datasets.

Preprocessing

a character vector, giving the preprocessing types. One of "histogram", "Parzen window" or "k-nearest neighbour".

a total of positive relative deviations standing for the maximum acceptable measure of distance between predictive and empirical densities. It satisfies the relation $0 \leq D \leq 1$. The default value is 0.025. However, if components with

cmax

maximum number of components $c_{\mathrm{max}} > 0$. The default value is 15.

Criterion

a character vector giving the infromation criterion types. One of default Akaike "AIC", "AIC3", "AIC4" or "AICc", Bayesian "BIC", consistent Akaike "CAIC", Hannan-Quinn

Variables

a character vector of length $d$ containing types of variables. One of "continuous" or "discrete".

pdf

a character vector of length $d$ containing continuous or discrete parametric family types. One of "normal", "lognormal", "Weibull", "gamma", "binomial", "Poisson" or "Dir

Theta1

a vector of length $d$ containing initial component parameters. One of $n_{il} = \textrm{Number of categories} - 1$ for "binomial" distribution or "NA" otherwise.

Theta2

a vector of length $d$ containing initial component parameters. The value is NULL.

a vector or a list of vectors containing numbers of bins $v$ for the histogram and the Parzen window or numbers of nearest neighbours $k$ for the k-nearest neighbour. There is no genuine rule to identify $v$ or $k$. Consequently, the REBMIX alg

ymin

a vector of length $d$ containing minimum observations. The default value is NULL.

ymax

a vector of length $d$ containing maximum observations. The default value is NULL.

acceleration rate $0 < a_{\mathrm{r}} \leq 1$. The default value is 0.1 and in most cases does not have to be altered.

Restraints

a character string giving the restraints type. One of "rigid" or default "loose". The rigid restraints are obsolete and applicable for well separated components only.

...

potential further arguments of the method.

Value

Dataseta list of data frames of size $n \times d$ containing d-dimensional datasets. Each of the $d$ columns represents one random variable. Number of observations $n$ equals the number of rows in the datasets.
wa list of data frames each containing $c$ component weights $w_{l}$ summing to 1.
Thetaa list of data frames each containing $c$ parametric family types pdfi. One of "normal", "lognormal", "Weibull", "gamma", "binomial", "Poisson" or "Dirac". Component parameters theta1.i follow the parametric family types. One of $\mu_{il}$ for normal and lognormal distributions and $\theta_{il}$ for Weibull, gamma, binomial, Poisson and Dirac distributions. Component parameters theta2.i follow theta1.i. One of $\sigma_{il}$ for normal and lognormal distributions, $\beta_{il}$ for Weibull and gamma distributions and $p_{il}$ for binomial distribution.
Variablesa character vector containing types of variables. One of "continuous" or "discrete".
pdfa character vector containing continuous or discrete parametric family types. One of "normal", "lognormal", "Weibull", "gamma", "binomial", "Poisson" or "Dirac".
Theta1a vector containing initial component parameters. One of $n_{il} = \textrm{Number of categories} - 1$ for "binomial" distribution or "NA" otherwise.
Theta2a vector containing initial component parameters. The value is NULL.
summarya data frame with additional information about dataset, preprocessing, $D$, $c_{\mathrm{max}}$, information criterion type, $a_{\mathrm{r}}$, restraints type, optimal $c$, optimal $v$ or $k$, $y_{i0}$, optimal $h_{i}$, information criterion $\mathrm{IC}$, log likelihood $\mathrm{log}\, L$ and degrees of freedom $M$.
posposition in the summary data frame at which log likelihood $\mathrm{log}\, L$ attains its maximum.
all.Imaxa list of all numbers of iterations.
all.ca list of all numbers of components.
all.ICa list of all information criteria.
all.logLa list of all log lekelihoods.
all.Da list of all totals of positive relative deviations.

References

H. A. Sturges. The choice of a class interval. Journal of American Statistical Association, 21(153): 65-66, 1926. http://www.jstor.org/stable/2965501. M. Nagode and M. Fajdiga. A general multi-modal probability density function suitable for the rainflow ranges of stationary random processes. International Journal of Fatigue, 20(3):211-223, 1998. http://dx.doi.org/10.1016/S0142-1123(97)00106-0. M. Nagode and M. Fajdiga. An improved algorithm for parameter estimation suitable for mixed weibull distributions. International Journal of Fatigue, 22(1):75-80, 2000. http://dx.doi.org/10.1016/S0142- 1123(99)00112-7. M. Nagode, J. Klemenc, and M. Fajdiga. Parametric modelling and scatter prediction of rainflow matrices. International Journal of Fatigue, 23(6):525-532, 2001. http://dx.doi.org/10.1016/S0142-1123(01)00007- X. M. Nagode and M. Fajdiga. An alternative perspective on the mixture estimation problem. Reliability Engineering & System Safety, 91(4):388-397, 2006. http://dx.doi.org/10.1016/j.ress.2005.02.005. M. Nagode and M. Fajdiga. The rebmix algorithm for the univariate finite mixture estimation. Communications in Statistics - Theory and Methods, 40(5):876-892, 2011a. http://dx.doi.org/10.1080/03610920903480890. M. Nagode and M. Fajdiga. The rebmix algorithm for the multivariate finite mixture estimation. Communications in Statistics - Theory and Methods, 40(11):2022-2034, 2011b. http://dx.doi.org/10.1080/03610921003725788.

Examples

Run this code

## Generate the complex 1 dataset.

n <- c(998, 263, 1086, 487, 213, 1076, 232, 
  784, 840, 461, 773, 24, 811, 1091, 861)

Theta <- rbind(pdf = "normal",
  theta1 = c(688.4, 265.1, 30.8, 934, 561.6, 854.9, 883.7, 
  758.3, 189.3, 919.3, 98, 143, 202.5, 628, 977),
  theta2 = c(12.4, 14.6, 14.8, 8.4, 11.7, 9.2, 6.3, 10.2,
  9.5, 8.1, 14.7, 11.7, 7.4, 10.1, 14.6))

complex1 <- RNGMIX(Dataset = "complex1",
  rseed = -1,
  n = n,
  Theta = Theta)
  
complex1

complex1$Dataset[[1]][1:20, ]  

## Estimate number of components, component weights and component parameters. 

v <- c(as.integer(1 + log2(sum(n))), ## Minimum v follows the Sturges rule.
  as.integer(2 * sum(n)^0.5)) ## Maximum v follows the RootN rule.

## Number of classes or nearest neighbours to be processed.

N <- as.integer(log(v[2] / (v[1] + 1)) / log(1 + 1 / v[1]))

K <- c(v[1], as.integer((v[1] + 1) * (1 + 1 / v[1])^(0:N)))

complex1est <- REBMIX(Dataset = complex1$Dataset, 
  Preprocessing = "histogram", 
  D = 0.0025, 
  cmax = 30, 
  Criterion = "BIC", 
  Variables = "continuous",
  pdf = "normal", 
  K = K)
                 
complex1est

## Plot the finite mixture.

plot(complex1est, npts = 1000)

Run the code above in your browser using DataLab