kml
is a new implematation of k-means for longitudinal data (or trajectories). This algorithm is able to deal with missing value and
provides an easy way to re roll the algorithm several times, varying the starting conditions and/or the number of clusters looked for.
Here is the description of the algorithm. For an overview of the package, see kml-package.kml(Object, nbClusters = 2:6, nbRedrawing = 20, saveFreq = 100,
maxIt = 200, trajMinSize = 2, print.cal = FALSE,
print.traj = FALSE, imputationMethod = "copyMean",
distance, power = 2, centerMethod = meanNA, startingCond = "allMethods",
distanceStartingCond = "euclidean", ...)
Clusterization
.kml
must work. By default,
nbClusters
is 2:6
which indicates that kml
must
search partitions with respectively 2, theClusterizLongData
once in a wilde. saveFreq
define the frequency of the saving
process. The ClusterizLongData
istrajMinSize
sets the
minimum number of values that a trajectory must contain not to be
excluded. For example, if the trajectories have imputationMethod
define the method use to impute the
missing value. It should be one of
"LOCF","LOCB","linearInterpolation","linedistanceStartingCond
define the distance that will be
use to calculate this matrix. It should be one of "euclidean",
"maximum",ClusterizLongData
object, after having added
some Clusterization
to it.distance
is set to a classical
distance (one of "euclidean", "maximum", "manhattan", "canberra",
"binary" or "minkowski") andprint.traj
is set toFALSE
(the default),kml
call a C
compiled (optimized) procedure.print.traj=TRUE
,kml
uses a R non compiled
programmes.Example
section).
If for a specific use, you need a different distance, feel free to
contact the author.kml
works on object of class ClusterizLongData
.
For each number included in nbClusters
, kml
compute a
Clusterization
then stores it in the field
clusters
of the object ClusterizLongData
according to its number of clusters.
The algorithm starts over as many times as it is told in nbRedrawing
. By default, it is executed for 2,
3, 4, 5 and 6 clusters 20 times each, namely 100 times.
When a Clusterization
has been found, it is added to the slot
clusters
. clusters
is a list of 52 sublist called c1,
c2, c3 until c52. The sublist cX store the all Clusterization
with
X clusters. Inside a sublist, the
Clusterization
are sort from the biggest quality criterion to
the smallest (the best are stored first).
Note that Clusterization
are saved throughout the algorithm. If the user
interrupt the execution of kml
, the result is not lost. If the
user run kml on an object then run kml again on the same object, the
Clusterization
that are computed the second time are added to
the one allready present in the object (unless you "clear" some
list, see Object["clusters","clear"]<-value
in
ClusterizLongData
).
The possible starting conditions are "randomAll", "randomK" and
"maxDist", as define in partitionInitialise
. In
addition, the method "allMethods" is a shortcut that run a "maxDist", a "randomAll"
and "randomK" for all the other re rolling.http://christophe.genolini.free.fr/kml
kml-package
Classes : ClusterizLongData
, Clusterization
Methods : clusterizLongData
, choice
### Generation of some data
cld1 <- as.cld(generateArtificialLongData())
### We suspect 2, 3, 4 or 5 clusters, we want 3 redrawing.
# We want to "see" what happen (so printCal and printTraj are TRUE)
kml(cld1,2:6,3,printCal=TRUE,printTraj=TRUE)
### 4 seems to be the best. But to be sure, we try more redrawing 4 or 6 only.
# We don't want to see again, we want to get the result as fast as possible.
kml(cld1,c(4,6),10)
Run the code above in your browser using DataLab