learnDiagGaussian: Create an instance of a learn mixture model

Description

This function learn the optimal mixture model when the class labels are known according to the criterion among the list of model given in models.

Usage

learnDiagGaussian(data, labels, prop = NULL, models = clusterDiagGaussianNames(prop = "equal"), algo = "simul", nbIter = 100, epsilon = 1e-08, criterion = "ICL", nbCore = 1)
learnPoisson(data, labels, prop = NULL, models = clusterPoissonNames(prop = "equal"), algo = "simul", nbIter = 100, epsilon = 1e-08, criterion = "ICL", nbCore = 1)
learnGamma(data, labels, prop = NULL, models = clusterGammaNames(prop = "equal"), algo = "simul", nbIter = 100, epsilon = 1e-08, criterion = "ICL", nbCore = 1)
learnCategorical(data, labels, prop = NULL, models = clusterCategoricalNames(prop = "equal"), algo = "simul", nbIter = 100, epsilon = 1e-08, criterion = "ICL", nbCore = 1)
learnKernel(data, labels, prop = NULL, models = clusterKernelNames(prop = "equal"), algo = "impute", nbIter = 100, epsilon = 1e-08, dim = 10, kernelName = "gaussian", kernelParameters = 1, criterion = "ICL", nbCore = 1)

Arguments

data

frame or matrix containing the data. Rows correspond to observations and columns correspond to variables. If the data set contains NA values, they will be estimated during the estimation process.

labels

vector or factors giving the label class.

prop

[vector] with the proportions of each class. If NULL the proportions will be estimated using the labels.

models

[vector] of model names to run. By default all models are estimated.

algo

character defining the algo to used in order to learn the model. Possible values: "simul" (default), "impute" (faster but can produce biased results).

nbIter

integer giving the number of iterations to do. algo is "impute" this is the maximal authorized number of iterations. Default is 100.

epsilon

real giving the variation of the log-likelihood for stopping the iterations. Not used if algo is "simul". Default value is 1e-08.

criterion

character defining the criterion to select the best model. The best model is the one with the lowest criterion value. Possible values: "BIC", "AIC", "ICL" (default).

nbCore

integer defining the number of processors to use (default is 1, 0 for all).

dim

integer giving the dimension of the Gaussian density. Default is 10.

kernelName

string with a kernel name. Possible values: "gaussian", "polynomial", "exponential", "linear", "hamming". Default is "gaussian".

kernelParameters

[vector] with the parameters of the chosen kernel. Default is 1.

Value

An instance of a learned mixture model class.

Examples

Run this code


## A quantitative example with the famous iris data set
data(iris)

## get data and target
x <- as.matrix(iris[,1:4]);
z <- as.vector(iris[,5]);
n <- nrow(x); p <- ncol(x);

## add missing values at random
indexes <- matrix(c(round(runif(5,1,n)), round(runif(5,1,p))), ncol=2);
x[indexes] <- NA;

## learn model
model <- learnDiagGaussian( data=x, labels= z, prop = c(1/3,1/3,1/3)
                          , models = clusterDiagGaussianNames(prop = "equal")
                          )

## get summary
summary(model)

## use graphics functions
## Not run: 
# plot(model)
# ## End(Not run)
## print model
## Not run: 
# print(model)
# ## End(Not run)

## get estimated missing values
missingValues(model)

Run the code above in your browser using DataLab