kml
is a new implementation of k-means for longitudinal data (or trajectories). This algorithm is able to deal with missing value and
provides an easy way to re roll the algorithm several times, varying the starting conditions and/or the number of clusters looked for.
Here is the description of the algorithm. For an overview of the package, see kml3d-package.kml(object,nbClusters=2:6,nbRedrawing=3,toPlot="none",paramKml=parKml(),
criterionNames=c("calinski","test"))
Clustering
.kml
must work. By default,
nbClusters
is 2:6
which indicates that kml
must
search partitions with respectively 2, the[character]
: during computation, kml
can
display some graphes. If toPlot="traj"
,
then the trajectories are plot (like with function
plot,ClusterLongDa
[ParKml ]
: set the option used
by kml
(like the starting condition, the imputation methods,
the save frequency, the maximum number of iteration, , the distance
used...) See [character]
: Criterion that shall be
compute ofr each Clustering
.ClusterLongData
object by adding some Clustering
to it.distanceName
is set to a classical
distance (one of "euclidean", "maximum", "manhattan", "canberra",
"binary" or "minkowski") andtoPlot
is set to"criterion"
or"none"
(the default),kml
call aC
compiled (optimized) procedure.toPlot
to"both"
or"traj"
,kml
uses aR
non compiled
programme.C
prodecure is arround 25 times faster than the R
one.
So we advice to use the R procedure 1/ for trying some new method
(like using a new distance) or 2/ to "see" the very first cluster
construction, in order to check that every thing goes right. Then it's
time to switch to the C
procedure (like we do in Example
section).
If for a specific use, you need a different distance, feel free to
contact the author.kml
works on object of class ClusterLongData
.
For each number i
included in nbClusters
, kml
computes a
Clustering
with i
clusters then stores it in the field
ci
of the object ClusterLongData
according to its number
of clusters i
.
The algorithm starts over as many times as it is told in nbRedrawing
. By default, it is executed for 2,
3, 4, 5 and 6 clusters 20 times each, namely 100 times.
When a Clustering
has been found, it is added to the slot
ci
. ci
stores the all Clustering
with
i
clusters. Inside a sublist, the
Clustering
are sorted either in the creation order. They can
also be sort from the biggest quality criterion to
the smallest (the best are stored first) using ordered,ListClustering
.
Note that Clustering
are saved throughout the algorithm. If the user
interrupts the execution of kml
, the result is not lost. If the
user run kml
on an object then run kml
again on the same object, the
Clustering
computed the second time are added to
the ones already present in the object (unless you "clear" some
list, see object["clusters","clear"]<-value
in
ClusterLongData
).
The possible starting conditions are "randomAll", "randomK" and
"maxDist", as defined in partitionInitialise
. In
addition, the method "allMethods" is a shortcut that run a "maxDist", a "randomAll"
and "randomK" for all the other re rolling.kml3d-package
Classes : ClusterLongData
, Clustering
Methods : clusterLongData
, choice
### Generation of some data
cld1 <- generateArtificialLongData(c(15,15,15))
### We suspect 2, 3, 4 or 5 clusters, we want 3 redrawing.
# We want to "see" what happen (so toPlot="both")
kml(cld1,2:5,3,toPlot="both")
### 3 seems to be the best.
# We don't want to see again, we want to get the result as fast as possible.
# Just, to check the overall process, we plot the criterion evolution
kml(cld1,3,10,toPlot="criterion")
Run the code above in your browser using DataLab