kml
.KmL
go through three steps, each of which
is associated to some functions:
KmL
works on object of class ClusterizLongData
(abreviated cld
).
Data preparation therfore simply consists in tranforming data into an object ClusterizLongData
.
This is done via function
as.cld
. This function deals with
matrix
(see as.clusterizLongData.data.frame
) and
data.frame
(see as.clusterizLongData.matrix
).
Instead of working on real data, one can also work on artificial
data. Such data will be of the ArtificialLongData
type. It is
then possible to turn them into ClusterizLongData
using
function as.cld
for ArtificialLongData
(see as.cld.artificialLongData
).ClusterizLongData
has been created, the algorithm
KmL
can be executed.
Starting with a ClusterizLongData
, kml
built a Clusterization
.
A object of class Clusterization
is a partition of trajectories
into subgroups. The object also contains a information as the percentage of trajectories contained in each group or the Calinski criterion.
kml
is a "hill-climbing" algorhithm. The specificity of this
kind of algorithm is that it always converges towards a maximum, but
one cannot know whether it is a local or a global maximum. It offers
no guarantee of optimality.
To maximize one's chances of getting a quality Clusterization
, it is better to execute the hill climbing algorithm several times,
then to choose the best solution. By default, kml
executes the hill climbing algorithm 20 times
and chooses the Clusterization
maximising the determinant of the matrix between.
Likewise, it is not possible to know beforehand the optimum number of clusters.
On the other hand, afterwards, it is possible to calculate clues that will enable us to choose. kml
uses the Calinski criterion
.
In the end, kml
tests by default 2, 3, 4, 5 et 6 clusters, 20 times each.kml
has constructed a certain number of
Clusterization
, the user can examine them one by one and choose
the one that suits
them best. This can be done via function choice
. choice
opens two graphic windows.
The one on the left shows all of Calinski criterion calculated for each Clusterization
;
the one on the right shows the slected Clusterization
. The arrows on keyboard enable to move from one Clusterization
to another.
When a Clusterization
has been chosen, it is possible to put it
on the screen, to save it in memory or to export it towards a file. The clusters are therefore exported towards the file
nom-cluster.csv
. Criteria are exported towards nom-criteres.csv
. The distances and posterior probability are in
nom-distance.csv
(in preparation, non implemented for the time being).
Last but not least, it is possible to exporte a graphic representation
of clusters. With the keyboard, it is possible to the aspect of the graphic (black and white or color, presence or absence of trajectories, sub groups, size of fonts...). The final graphic
can be exported in the usual manner : right click on the figure...http://christophe.genolini.free.fr/kml
### 1. Data Preparation
myCld <- as.clusterizLongData(generateArtificialLongData())
### 2. Building "optimal" clusterization (with only 5 redrawings)
#kml(myCld,,5,print="all")
### 3. Exporting results
#choice(myCld)
Run the code above in your browser using DataLab