kml: ~ Algorithm kml: K-means for Longitidinal data ~

Description

kml is a implementation of k-means for longitudinal data (or trajectories). This algorithm is able to deal with missing value and provides an easy way to re roll the algorithm several times, varying the starting conditions and/or the number of clusters looked for. Here is the description of the algorithm. For an overview of the package, see kml-package.

Usage

kml(object,nbClusters=2:6,nbRedrawing=20,toPlot="none",parAlgo=parALGO())

Arguments

object

[ClusterLongData]: contains trajectories to cluster as well as previous Partition.

nbClusters

[vector(numeric)]: Vector containing the number of clusters with which kml must work. By default, nbClusters is 2:6 which indicates that kml must search partitions with respectively 2, the

nbRedrawing

[numeric]: Sets the number of time that k-means must be re-run (with different starting conditions) for each number of clusters.

toPlot

[character]: either 'traj' for plotting trajectories alone, 'criterion' for plotting criterion alone, 'both' for plotting both or 'none' for not display anything (faster).

parAlgo

[ParKml]: parameters used to run the algorithm. They can be change using the function parKml. Option are mainly 'saveFreq', 'maxIt', 'imputationMethod',

Value

A ClusterLongData object, after having added some Partition to it.

Optimisation

Behind kml, there are two different procedures :

Fast: when the parameterdistanceis set to "euclidean" andtoPlotis set to 'none' or 'criterion',kmlcall a C compiled (optimized) procedure.
Slow: when the user defines its own distance or if he wants to see the construction of the clusters by settingtoPlotto 'traj' or 'both',kmluses a R non compiled programmes.

The C prodecure is 25 times faster than the R one. So we advice to use the R procedure 1/ for trying some new method (like using a new distance) or 2/ to "see" the very first clusters construction, in order to check that every thing goes right. Then it is better to switch to the C procedure (like we do in Example section). If for a specific use, you need a different distance, feel free to contact the author.

Details

kml works on object of class ClusterLongData. For each number included in nbClusters, kml computes a Partition then stores it in the field cX of the object ClusterLongData according to the number of clusters 'X'. The algorithm starts over as many times as it is told in nbRedrawing. By default, it is executed for 2, 3, 4, 5 and 6 clusters 20 times each, namely 100 times. When a Partition has been found, it is added to the corresponding slot c1, c2, c3, ... or c26. The sublist cX stores the all Partition with X clusters. Inside a sublist, the Partition can be sorted from the biggest quality criterion to the smallest (the best are stored first, using ordered,ListPartition), or not. Note that Partition are saved throughout the algorithm. If the user interrupts the execution of kml, the result is not lost. If the user run kml on an object, then runnig kml again on the same object will add some new Partition to the one already found. The possible starting conditions are defined in initializePartition.

Examples

Run this code

### Generation of some data
cld1 <- generateArtificialLongData(25)

### We suspect 3, 4 or 6 clusters, we want 3 redrawing.
###   We want to "see" what happen (so printCal and printTraj are TRUE)
kml(cld1,c(3,4,6),3,toPlot='both')

### 4 seems to be the best. We want 7 more redrawing.
###   We don't want to see again, we want to get the result as fast as possible.
kml(cld1,4,10)

Run the code above in your browser using DataLab