VarSelLCM-package: Variable Selection in model-based clustering managed by the Latent Class Model for analysis mixed-type data with missing values.

Description

The package uses a finite mixture model for analyzing mixed-type data (data with continuous and/or count and/or categorical variables) with missing values (missing at random) by assuming independence between classes. The one-dimensional marginals of the components follow standard distributions for facilitating both the model interpretation and the model selection. The variable selection is led by an alternated optimization procedure for maximizing the MICL criterion. The maximum likelihood inference is done by an EM algorithm for the selected model. This package also performs the imputation of missing values.

Arguments

Details

Package:	VarSelLCM
Type:	Package
Version:	2.0.0
Date:	2016-04-18
License:	GPL-2
LazyLoad:	yes
URL:	http://varsellcm.r-forge.r-project.org/

The main function to use is VarSelCluster.

Function VarSelCluster carries out the model selection by maximizing the MICL criterion, then it performs the maximum likelihood estimation of the selected model via an EM algorithm.

Tool methods summary, print and plot are available for facilitating the interpretation.

References

M. Marbac and M. Sedki (2015). Variable selection for model-based clustering using the integrated completed-data likelihood. Preprint.

Examples

Run this code

# NOT RUN {
# Package loading
require(VarSelLCM)

# Data loading:
# x contains the observed variables
# z the known statu (i.e. 1: absence and 2: presence of heart disease)
data(heart)
z <- heart[,"Class"]
x <- heart[,-13]

# Cluster analysis without variable selection
res_without <- VarSelCluster(x, 2, vbleSelec = FALSE)

# Cluster analysis with variable selection (with parallelisation)
res_with <- VarSelCluster(x, 2, nbcores = 2, initModel=40)

# Confusion matrices: variable selection decreases the misclassification error rate
print(table(z, res_without@partitions@zMAP))
print(table(z, res_with@partitions@zMAP))

# Summary of the best model
summary(res_with)

# Parameters of the best model
print(res_with)

# Plot of the best model
plot(res_with)

# }
# NOT RUN {
# }

Run the code above in your browser using DataLab