Learn R Programming

kml (version 0.9.0)

kml-package: ~ Overview: K-means for Longitudinal data ~

Description

KmL is a non parametric algorithm for clustering longitudinal data. Here is an overview of the package. For the description of the algorithm, see kml.

Arguments

Overview

To clusterize data, KmL go through three steps, each of which is associated to some functions:
  1. Data preparation
  2. Building "optimal" clusterization.
  3. Exporting results

1. Data preparation

KmL works on object of class ClusterizLongData (abreviated cld). Data preparation therfore simply consists in tranforming data into an object ClusterizLongData. This is done via function as.cld. This function deals with matrix (see as.clusterizLongData.data.frame) and data.frame (see as.clusterizLongData.matrix). Instead of working on real data, one can also work on artificial data. Such data will be of the ArtificialLongData type. It is then possible to turn them into ClusterizLongData using function as.cld for ArtificialLongData (see as.cld.artificialLongData).

2. Building "optimal" clusterization

Once the object ClusterizLongData has been created, the algorithm KmL can be executed. Starting with a ClusterizLongData, kml built a Clusterization. A object of class Clusterization is a partition of trajectories into subgroups. The object also contains a information as the percentage of trajectories contained in each group or the Calinski criterion. kml is a "hill-climbing" algorhithm. The specificity of this kind of algorithm is that it always converges towards a maximum, but one cannot know whether it is a local or a global maximum. It offers no guarantee of optimality. To maximize one's chances of getting a quality Clusterization, it is better to execute the hill climbing algorithm several times, then to choose the best solution. By default, kml executes the hill climbing algorithm 20 times and chooses the Clusterization maximising the determinant of the matrix between. Likewise, it is not possible to know beforehand the optimum number of clusters. On the other hand, afterwards, it is possible to calculate clues that will enable us to choose. kml uses the Calinski criterion. In the end, kml tests by default 2, 3, 4, 5 et 6 clusters, 20 times each.

3. Exporting results

When kml has constructed a certain number of Clusterization, the user can examine them one by one and choose the one that suits them best. This can be done via function choice. choice opens two graphic windows. The one on the left shows all of Calinski criterion calculated for each Clusterization ; the one on the right shows the slected Clusterization. The arrows on keyboard enable to move from one Clusterization to another. When a Clusterization has been chosen, it is possible to put it on the screen, to save it in memory or to export it towards a file. The clusters are therefore exported towards the file nom-cluster.csv. Criteria are exported towards nom-criteres.csv. The distances and posterior probability are in nom-distance.csv (in preparation, non implemented for the time being). Last but not least, it is possible to exporte a graphic representation of clusters. With the keyboard, it is possible to the aspect of the graphic (black and white or color, presence or absence of trajectories, sub groups, size of fonts...). The final graphic can be exported in the usual manner : right click on the figure...

Author(s)

Christophe Genolini PSIGIAM: Paris Sud Innovation Group in Adolescent Mental Health INSERM U669 / Maison de Solenn / Paris Responsable :

English translation

Rapha�l Ricaud Laboratoire "Sport & Culture" / "Sports & Culture" Laboratory University of Paris 10 / Nanterre

Details

ll{ Package: kml Type: Package Version: 0.9.0 Date: 2008-05-01 License: GPL (>= 2) Lazyload: yes Depends: methods,codetools,clv URL: http://www.r-project.org URL: http://christophe.genolini.free.fr/kml }

References

Article submited web site: http://christophe.genolini.free.fr/kml

Examples

Run this code
### 1. Data Preparation
myCld <- as.clusterizLongData(generateArtificialLongData())

### 2. Building "optimal" clusterization (with only 5 redrawings)
#kml(myCld,,5,print="all")

### 3. Exporting results
#choice(myCld)

Run the code above in your browser using DataLab