kml3d-package: ~ Overview: KmL3D, K-means for joint Longitudinal data ~

Description

KmL3D is a new implementation of k-means for longitudinal data (or trajectories). Here is an overview of the package.

Arguments

Overview

To cluster data, KmL3D go through three steps, each of which is associated to some functions:

Data preparation
Building "optimal" clusterization.
Exporting results
Visualizing and exporting 3D object

1. Data preparation

KmL3D works on object of class ClusterLongData. Data preparation therefore simply consists in transforming data into an object ClusterLongData. This can be done via function clusterLongData (cld in short) or as.clusterLongData (as.cld in short). The formers lets the user build some data from scratch, the latters converts a data.frame or an array in ClusterLongData. Working on several variables mesured on different scales can give to much weight to one of the dimention. So the function scale normalizes data. Instead of working on real data, one can also work on artificial data. Such data can be created with generateArtificialLongData (gald in short).

2. Building "optimal" clustering

Once an object of class ClusterLongData has been created, the algorithm kml3d can be run. Starting with a ClusterLongData, kml3d built several Clustering. A object of class Clustering is a partition of trajectories into subgroups. It also contains some information like the percentage of trajectories contained in each group or some quality critetion (like the Calinski & Harabasz). kml3d is a "hill-climbing" algorithm. The specificity of this kind of algorithm is that it always converges towards a maximum, but one cannot know whether it is a local or a global maximum. It offers no guarantee of optimality. To maximize one's chances of getting a quality Clustering, it is better to execute the hill climbing algorithm several times, then to choose the best solution. By default, kml3d executes the hill climbing algorithm 20 times and chooses the Clustering maximising the Calinski and Harabatz criterion. Likewise, it is not possible to know beforehand the optimum number of clusters. On the other hand, afterwards, it is possible to calculate clues that will enable us to choose. In the end, kml3d tests by default 2, 3, 4, 5 et 6 clusters, 20 times each.

3. Exporting results

When kml3d has constructed some Clustering, the user can examine them one by one and choose to export some. This can be done via function choice. choice opens a graphic windows showing various information including the trajectories cluterized by a specific Clustering. When some Clustering has been selected (the user can select more than 1), it is possible to save them. The clusters are therefore exported towards the file nom-cluster.csv. Criteria are exported towards nom-criteres.csv. The graphs are exported according to their extension.

4. Visualizing and exporting 3D object

kml3d also propose tools to visualize the trajectories in 3D. plot3d using the library rgl to plot two variables according to time (either the all set of trajectories, or just the mean trajectories). Then the user can make the graphical representation turn using the mouse. plot3dPdf build an Triangles object. These kind of object can be include in a pdf file using saveTrianglesAsASY and the software asymptote. Once again, it is possible to make the image in the pdf file move using the mouse -so the reader gets real 3D-.

How to get help?

For those who are not familiar with S4 programming: In S4 programming, each function can be adapted for some specific arguments.

To get help on a function (for exampleplot), use:?(plot).

To get help on a function adapted to its argument (for example plot on argument LongData), used: ?"plot,LongData".

Author(s)

Christophe Genolini INSERM U669 / PSIGIAM: Paris Sud Innovation Group in Adolescent Mental Health Modal'X / Universite Paris Ouest-Nanterre- La Defense Contact author : genolini@u-paris10.fr

Details

ll{ Package: KmL3D Type: Package Version: 0.7 Date: 2011-12-12 License: GPL (>= 2) Lazyload: yes Depends: methods,graphics,rgl,misc3d URL: http://www.r-project.org URL: http://christophe.genolini.free.fr/kml }

References

Article "KmL: K-means for Longitudinal Data", in Computational Statistics, Volume 25, Issue 2 (2010), Page 317. Web site: http://christophe.genolini.free.fr/kml

Examples

Run this code

### 1. Data Preparation
myCld <- generateArtificialLongData(c(15,15,15))

### 2. Building "optimal" clusterization (with only 3 redrawings)
kml3d(myCld,nbRedrawing=3)

### 3. Exporting results
try(choice(myCld))

### 4. Visualizing in 3D
plot3d(myCld)

Run the code above in your browser using DataLab