REBMIX: REBMIX Algorithm for Univariate or Multivariate Finite Mixture Estimation

Description

Returns the REBMIX algorithm output for mixtures of conditionally independent normal, lognormal, Weibull, gamma, binomial, Poisson or Dirac component densities.

Usage

REBMIX(Dataset = NULL, Preprocessing = NULL, cmax = 15, 
       Criterion = "AIC", Variables = NULL, pdf = NULL,
       Theta1 = NULL, Theta2 = NULL, K = NULL, y0 = NULL, 
       ymin = NULL, ymax = NULL, ar = 0.1, Restraints = "loose", ...)

Arguments

Dataset

a list of data frames of size $n \times d$ containing d-dimensional datasets. Each of the $d$ columns represents one random variable. Number of observations $n$ equals the number of rows in the datasets.

Preprocessing

a character vector, giving the preprocessing types. One of "histogram", "Parzen window" or "k-nearest neighbour".

cmax

maximum number of components $c_{\mathrm{max}} > 0$. The default value is 15.

Criterion

a character vector giving the infromation criterion types. One of default Akaike "AIC", "AIC3", "AIC4" or "AICc", Bayesian "BIC", consistent Akaike "CAIC", Hannan-Quinn

Variables

a character vector of length $d$ containing types of variables. One of "continuous" or "discrete".

pdf

a character vector of length $d$ containing continuous or discrete parametric family types. One of "normal", "lognormal", "Weibull", "gamma", "binomial", "Poisson" or "Dir

Theta1

a vector of length $d$ containing initial component parameters. One of $n_{il} = \textrm{Number of categories} - 1$ for "binomial" distribution or "NA" otherwise.

Theta2

a vector of length $d$ containing initial component parameters. The value is NULL.

a vector or a list of vectors containing numbers of bins $v$ for the histogram and the Parzen window or numbers of nearest neighbours $k$ for the k-nearest neighbour. There is no genuine rule to identify $v$ or $k$. Consequently, the REBMIX alg

a vector of length $d$ containing origins. The default value is NULL.

ymin

a vector of length $d$ containing minimum observations. The default value is NULL.

ymax

a vector of length $d$ containing maximum observations. The default value is NULL.

acceleration rate $0 < a_{\mathrm{r}} \leq 1$. The default value is 0.1 and in most cases does not have to be altered.

Restraints

a character string giving the restraints type. One of "rigid" or default "loose". The rigid restraints are obsolete and applicable for well separated components only.

...

potential further arguments of the method.

Value

Dataseta list of data frames of size $n \times d$ containing d-dimensional datasets. Each of the $d$ columns represents one random variable. Number of observations $n$ equals the number of rows in the datasets.
wa list of data frames each containing $c$ component weights $w_{l}$ summing to 1.
Thetaa list of data frames each containing $c$ parametric family types pdfi. One of "normal", "lognormal", "Weibull", "gamma", "binomial", "Poisson" or "Dirac". Component parameters theta1.i follow the parametric family types. One of $\mu_{il}$ for normal and lognormal distributions and $\theta_{il}$ for Weibull, gamma, binomial, Poisson and Dirac distributions. Component parameters theta2.i follow theta1.i. One of $\sigma_{il}$ for normal and lognormal distributions, $\beta_{il}$ for Weibull and gamma distributions and $p_{il}$ for binomial distribution.
summarya data frame with additional information about dataset, preprocessing, $c_{\mathrm{max}}$, information criterion type, $a_{\mathrm{r}}$, restraints type, optimal $c$, optimal $v$ or $k$, $K$, $y_{i0}$, optimal $h_{i}$, information criterion $\mathrm{IC}$, log likelihood $\mathrm{log}\, L$ and degrees of freedom $M$.
posposition in the summary data frame at which log likelihood $\mathrm{log}\, L$ attains its maximum.
opt.ca list of vectors containing numbers of components for optimal $v$ for the histogram and the Parzen window or for optimal number of nearest neighbours $k$ for the k-nearest neighbour.
opt.ICa list of vectors containing information criteria for optimal $v$ for the histogram and the Parzen window or for optimal number of nearest neighbours $k$ for the k-nearest neighbour.
opt.logLa list of vectors containing log likelihoods for optimal $v$ for the histogram and the Parzen window or for optimal number of nearest neighbours $k$ for the k-nearest neighbour.
opt.Da list of vectors containing totals of positive relative deviations for optimal $v$ for the histogram and the Parzen window or for optimal number of nearest neighbours $k$ for the k-nearest neighbour.
all.Ka list of vectors containing all processed numbers of bins $v$ for the histogram and the Parzen window or all processed numbers of nearest neighbours $k$ for the k-nearest neighbour.
all.ICa list of vectors containing information criteria for all processed numbers of bins $v$ for the histogram and the Parzen window or for all processed numbers of nearest neighbours $k$ for the k-nearest neighbour.
calla function call used to create the object.

References

H. A. Sturges. The choice of a class interval. Journal of American Statistical Association, 21(153): 65-66, 1926. http://www.jstor.org/stable/2965501. M. Nagode and M. Fajdiga. A general multi-modal probability density function suitable for the rainflow ranges of stationary random processes. International Journal of Fatigue, 20(3):211-223, 1998. http://dx.doi.org/10.1016/S0142-1123(97)00106-0. M. Nagode and M. Fajdiga. An improved algorithm for parameter estimation suitable for mixed weibull distributions. International Journal of Fatigue, 22(1):75-80, 2000. http://dx.doi.org/10.1016/S0142-1123(99)00112-7. M. Nagode, J. Klemenc, and M. Fajdiga. Parametric modelling and scatter prediction of rainflow matrices. International Journal of Fatigue, 23(6):525-532, 2001. http://dx.doi.org/10.1016/S0142-1123(01)00007-X. M. Nagode and M. Fajdiga. An alternative perspective on the mixture estimation problem. Reliability Engineering & System Safety, 91(4):388-397, 2006. http://dx.doi.org/10.1016/j.ress.2005.02.005. M. Nagode and M. Fajdiga. The rebmix algorithm for the univariate finite mixture estimation. Communications in Statistics - Theory and Methods, 40(5):876-892, 2011a. http://dx.doi.org/10.1080/03610920903480890. M. Nagode and M. Fajdiga. The rebmix algorithm for the multivariate finite mixture estimation. Communications in Statistics - Theory and Methods, 40(11):2022-2034, 2011b. http://dx.doi.org/10.1080/03610921003725788.

Examples

Run this code

## Generate the complex 1 dataset.

n <- c(998, 263, 1086, 487, 213, 1076, 232, 
  784, 840, 461, 773, 24, 811, 1091, 861)

Theta <- rbind(pdf = "normal",
  theta1 = c(688.4, 265.1, 30.8, 934, 561.6, 854.9, 883.7, 
  758.3, 189.3, 919.3, 98, 143, 202.5, 628, 977),
  theta2 = c(12.4, 14.6, 14.8, 8.4, 11.7, 9.2, 6.3, 10.2,
  9.5, 8.1, 14.7, 11.7, 7.4, 10.1, 14.6))

complex1 <- RNGMIX(Dataset = "complex1",
  rseed = -1,
  n = n,
  Theta = Theta)
  
complex1

complex1$Dataset[[1]][1:20, ]  

## Estimate number of components, component weights and component parameters. 

v <- c(as.integer(1 + log2(sum(n))), ## Minimum v follows the Sturges rule.
  as.integer(2 * sum(n)^0.5)) ## Maximum v follows the RootN rule.

## Number of classes or nearest neighbours to be processed.

N <- as.integer(log(v[2] / (v[1] + 1)) / log(1 + 1 / v[1]))

K <- c(v[1], as.integer((v[1] + 1) * (1 + 1 / v[1])^(0:N)))

complex1est <- REBMIX(Dataset = complex1$Dataset, 
  Preprocessing = "histogram", 
  cmax = 30, 
  Criterion = "BIC", 
  Variables = "continuous",
  pdf = "normal", 
  K = K)
                 
complex1est

AIC(complex1est)

## Plot the finite mixture.

plot(complex1est, npts = 1000)

Run the code above in your browser using DataLab