"REBMIX"
Object of class REBMIX
.
Objects can be created by calls of the form new("REBMIX", ...)
.
Dataset
:a list of data frames of size \(n \times d\) containing d-dimensional datasets. Each of the \(d\) columns represents one random variable. Number of observations \(n\) equals the number of rows in the datasets.
Preprocessing
:a character vector giving the preprocessing types. One of "histogram"
, "Parzen window"
or "k-nearest neighbour"
.
cmax
:maximum number of components \(c_{\mathrm{max}} > 0\). The default value is 15
.
Criterion
:a character vector giving the information criterion types. One of default Akaike "AIC"
, "AIC3"
, "AIC4"
or "AICc"
,
Bayesian "BIC"
, consistent Akaike "CAIC"
, Hannan-Quinn "HQC"
, minimum description length "MDL2"
or "MDL5"
,
approximate weight of evidence "AWE"
, classification likelihood "CLC"
,
integrated classification likelihood "ICL"
or "ICL-BIC"
, partition coefficient "PC"
,
total of positive relative deviations "D"
or sum of squares error "SSE"
.
Variables
:a character vector of length \(d\) containing types of variables. One of "continuous"
or "discrete"
.
pdf
:a character vector of length \(d\) containing continuous or discrete parametric family types. One of "normal"
, "lognormal"
, "Weibull"
, "gamma"
, "binomial"
, "Poisson"
, "Dirac"
or "vonMises"
.
theta1
:a vector of length \(d\) containing initial component parameters. One of \(n_{il} = \textrm{number of categories} - 1\) for "binomial"
distribution or "NA"
otherwise.
theta2
:a vector of length \(d\) containing initial component parameters. Currently not used.
K
:a vector or a list of vectors containing numbers of bins \(v\) for the histogram and the Parzen window or numbers of nearest
neighbours \(k\) for the k-nearest neighbour. There is no genuine rule to identify \(v\) or \(k\). Consequently,
the REBMIX algorithm identifies them from the set K
of input values by
minimizing the information criterion. The Sturges rule \(v = 1 + \mathrm{log_{2}}(n)\), \(\mathrm{Log}_{10}\) rule \(v = 10 \mathrm{log_{10}}(n)\) or RootN
rule \(v = 2 \sqrt{n}\) can be applied to estimate the limiting numbers of bins
or the rule of thumb \(k = \sqrt{n}\) to guess the intermediate number of nearest neighbours. If, e.g., K = c(10, 20, 40, 60)
and minimum IC
coincides, e.g., 40
, brackets are set to 20
and 60
and the golden section is applied to refine the minimum search. See also kseq
for sequence of bins or nearest neighbours generation.
y0
:a vector of length \(d\) containing origins. The default value is numeric()
.
ymin
:a vector of length \(d\) containing minimum observations. The default value is numeric()
.
ymax
:a vector of length \(d\) containing maximum observations. The default value is numeric()
.
ar
:acceleration rate \(0 < a_{\mathrm{r}} \leq 1\). The default value is 0.1
and in most cases does not have to be altered.
Restraints
:a character giving the restraints type. One of "rigid"
or default "loose"
.
The rigid restraints are obsolete and applicable for well separated components only.
w
:a list of vectors of length \(c\) containing component weights \(w_{l}\) summing to 1.
Theta
:a list of lists each containing \(c\) parametric family types pdfl
. One of "normal"
, "lognormal"
, "Weibull"
, "gamma"
, "binomial"
, "Poisson"
, "Dirac"
or circular "vonMises"
defined for \(0 \leq y_{i} \leq 2 \pi\).
Component parameters theta1.l
follow the parametric family types. One of \(\mu_{il}\) for normal, lognormal and von Mises distributions and \(\theta_{il}\) for Weibull, gamma, binomial, Poisson and Dirac distributions.
Component parameters theta2.l
follow theta1.l
. One of \(\sigma_{il}\) for normal and lognormal distributions, \(\beta_{il}\) for Weibull and gamma distributions, \(p_{il}\) for binomial distribution and \(\kappa_{il}\) for von Mises distribution.
summary
:a data frame with additional information about dataset, preprocessing, \(c_{\mathrm{max}}\), information criterion type, \(a_{\mathrm{r}}\), restraints type, optimal \(c\), optimal \(v\) or \(k\), \(K\), \(y_{i0}\), \(y_{i\mathrm{min}}\), \(y_{i\mathrm{max}}\), optimal \(h_{i}\), information criterion \(\mathrm{IC}\), log likelihood \(\mathrm{log}\, L\) and degrees of freedom \(M\).
pos
:position in the summary
data frame at which log likelihood \(\mathrm{log}\, L\) attains its maximum.
opt.c
:a list of vectors containing numbers of components for optimal \(v\) for the histogram and the Parzen window or for optimal number of nearest neighbours \(k\) for the k-nearest neighbour.
opt.IC
:a list of vectors containing information criteria for optimal \(v\) for the histogram and the Parzen window or for optimal number of nearest neighbours \(k\) for the k-nearest neighbour.
opt.logL
:a list of vectors containing log likelihoods for optimal \(v\) for the histogram and the Parzen window or for optimal number of nearest neighbours \(k\) for the k-nearest neighbour.
opt.D
:a list of vectors containing totals of positive relative deviations for optimal \(v\) for the histogram and the Parzen window or for optimal number of nearest neighbours \(k\) for the k-nearest neighbour.
all.K
:a list of vectors containing all processed numbers of bins \(v\) for the histogram and the Parzen window or all processed numbers of nearest neighbours \(k\) for the k-nearest neighbour.
all.IC
:a list of vectors containing information criteria for all processed numbers of bins \(v\) for the histogram and the Parzen window or for all processed numbers of nearest neighbours \(k\) for the k-nearest neighbour.