kml3d: ~ Algorithm kml3d: K-means for Joint Longitidinal data ~

Description

kml3d is a new implementation of k-means for joint longitudinal data (or joint trajectories). This algorithm is able to deal with missing value and provides an easy way to re roll the algorithm several times, varying the starting conditions and/or the number of clusters looked for. Here is the description of the algorithm. For an overview of the package, see kml3d-package.

Usage

kml3d(object,nbClusters=2:6,nbRedrawing=20,toPlot="none",paramKml=parKml(),
  criterionNames=c("calinski","ray","davies","random"))

Arguments

object

[ClusterLongData]: contains trajectories to clusterize as well as previous Clustering.

nbClusters

[vector(numeric)]: Vector containing the number of clusters with which kml3d must work. By default, nbClusters is 2:6 which indicates that kml3d must search partitions with respectively 2,

nbRedrawing

[numeric]: Sets the number of time that k-means must be re-run (with different starting conditions) for each number of clusters.

toPlot

[character]: during computation, kml3d can display some graphes. If toPlot="traj", then the trajectories are plot (like with function plot,ClusterLong

paramKml

[ParKml]: set the option used by kml3d (like the starting condition, the imputation methods, the save frequency, the maximum number of iteration, , the distance used...) See

criterionNames

[character]: Criterion that shall be compute ofr each Clustering.

Value

None. This function internaly a ClusterLongData object by adding some Clustering to it.

Optimisation

Behind kml3d, there are two different procedures :

Fast: when the parameterdistanceNameis set to a classical distance (one of "euclidean", "maximum", "manhattan", "canberra", "binary" or "minkowski") andtoPlotis set to"criterion"or"none"(the default),kml3dcall aCcompiled (optimized) procedure.
Slow: when the user defines its own (non-classical) distance or if he wants to see the construction of the clusters by settingtoPlotto"both"or"traj",kml3duses aRnon compiled programme.

The C prodecure is arround 25 times faster than the R one. So we advice to use the R procedure 1/ for trying some new method (like using a new distance) or 2/ to "see" the very first cluster construction, in order to check that every thing goes right. Then it's time to switch to the C procedure (like we do in Example section). If for a specific use, you need a different distance, feel free to contact the author.

Author(s)

Christophe Genolini INSERM U669 / PSIGIAM: Paris Sud Innovation Group in Adolescent Mental Health Modal'X / Universite Paris Ouest-Nanterre- La Defense Contact author : genolini@u-paris10.fr

Details

kml3d works on object of class ClusterLongData. For each number i included in nbClusters, kml3d computes a Clustering with i clusters then stores it in the field ci of the object ClusterLongData according to its number of clusters i. The algorithm starts over as many times as it is told in nbRedrawing. By default, it is executed for 2, 3, 4, 5 and 6 clusters 20 times each, namely 100 times. When a Clustering has been found, it is added to the slot ci. ci stores the all Clustering with i clusters. Inside a sublist, the Clustering are sorted either in the creation order. They can also be sort from the biggest quality criterion to the smallest (the best are stored first) using ordered,ListClustering. Note that Clustering are saved throughout the algorithm. If the user interrupts the execution of kml3d, the result is not lost. If the user run kml3d on an object then run kml3d again on the same object, the Clustering computed the second time are added to the ones already present in the object (unless you "clear" some list, see object["clusters","clear"]<-value in ClusterLongData). The possible starting conditions are "randomAll", "randomK" and "maxDist", as defined in partitionInitialise. In addition, the method "allMethods" is a shortcut that run a "maxDist", a "randomAll" and "randomK" for all the other re rolling.

References

Article "KmL: K-means for Longitudinal Data", in Computational Statistics, Volume 25, Issue 2 (2010), Page 317. Web site: http://christophe.genolini.free.fr/kml

Examples

Run this code

### Generation of some data
cld1 <- generateArtificialLongData(c(15,15,15))

### We suspect 2, 3, 4 or 5 clusters, we want 3 redrawing.
#     We want to "see" what happen (so toPlot="both")
kml3d(cld1,2:5,3,toPlot="both")

### 3 seems to be the best.
#     We don't want to see again, we want to get the result as fast as possible.
#     Just, to check the overall process, we plot the criterion evolution
kml3d(cld1,3,10,toPlot="criterion")

Run the code above in your browser using DataLab