mscn: Mixtures of Multiple Scaled Contaminated Normal Distributions.

Description

Fits a mixture of multiple scaled contaminated normal distributions to the given data.

Usage

mscn(X,k,ini="km",sz=NULL,al=c(0.5,0.99),eta.min=1.01,m="BFGS",stop=c(10^-5,200),VB=FALSE)

Value

X: Data used for clustering.
n: The number of observations in the data.
d: The number of features in the data.
k: Value corresponding to the number of components.
cluster: Vector of group membership as determined by the model.
detect: Detect if the point is bad or not per each principal component given the cluster membership.
npar: The number of parameters.
mu: Either a vector of length d, representing the mean value, or (except for rmscn) a matrix whose rows represent different mean vectors; if it is a matrix, its dimensions must match those of x.
Lambda: Orthogonal matrix whose columns are the normalized eigenvectors of Sigma.
Gamma: Diagonal matrix of the eigenvalues of Sigma.
Sigma: A symmetric positive-definite matrix representing the scale matrix of the distribution.
alpha: Proportion of good observations.
eta: Degree of contamination.
z: The component membership of each observations.
v: The indicator if an observation is good or bad with respect to each dimension; 1 is good, and 0 means bad.
weight: The matrix of the expected value of the characteristic weights; corespond to the value of v+(1-v)/eta.
iter.stop: The number of iterations until convergence for the model.
loglik: The log-likelihood corresponding to the model.
AIC: The Akaike's Information Criterion of the model.
BIC: The Bayesian Information Criterion of the model.
ICL: The Integrated Completed Likelihood of the model.
KIC: The Kullback Information Criterion of the model.
KICc: The Bias correction of the Kullback Information Criterion of the model.
AWE: The Approximate Weight of Evidence of the model.
AIC3: Another version of Akaike's Information Criterion of the model.
CAIC: The Consistent Akaike's Information Criterion of the model.
AICc: The AIC version which is used when sample size n is small relative to d.
CLC: The Classification Likelihood Criterion of the model.

Arguments

X: A matrix or data frame such that rows correspond to observations and columns correspond to variables.
k: The number of clusters.
ini: Using kmeans by default or "pam" for partition around medoids, "mclust" for Gaussian mixture models, "random.soft" or "random.hard" for random or manual; if "manual", a partition (sz) must be provided.
sz: If initialization is "manual", this matrix contains the starting values for z.
al: 2-dimensional vector containing minimum and maximum proportion of good points in each group for the contaminated normal distribution.
eta.min: Minimum value for inflation parameter for the covariance matrix for the bad points.
m: Method for the optimization of the eigenvector matrix, see optim for other options.
stop: 2-dimensional vector with the Aitken criterion stopping rule and maximum number of iterations.
VB: If TRUE, tracing information on the progress of the optimization is produced; see optim for details and plotting of the log-likelihood versus iterations.

Author

Cristina Tortora and Antonio Punzo

References

Punzo, A. & Tortora, C. (2021). Multiple scaled contaminated normal distribution and its application in clustering. Statistical Modelling, 21(4): 332--358.

Examples

Run this code

## Not run:
if (FALSE) {
data(sim)
result <- mscn(X = sim, k = 2)
plot(result)
summary(result)}
## End(Not run)

Run the code above in your browser using DataLab