KmL3D
is a new implementation of k-means for longitudinal data (or trajectories).
Here is an overview of the package.KmL3D
go through three steps, each of which
is associated to some functions:
KmL3D
works on object of class ClusterLongData
.
Data preparation therefore simply consists in transforming data into an object ClusterLongData
.
This can be done via function
clusterLongData
(cld
in short) or
as.clusterLongData
(as.cld
in short).
The formers lets the user build some data from scratch, the latters
converts a data.frame
or an array
in ClusterLongData
.
Working on several variables mesured on different scales can give to
much weight to one of the dimention. So the function scale
normalizes data.
Instead of working on real data, one can also work on artificial
data. Such data can be created with generateArtificialLongData
(gald
in short).ClusterLongData
has been created, the algorithm
kml3d
can be run.
Starting with a ClusterLongData
, kml3d
built several Clustering
.
A object of class Clustering
is a partition of trajectories
into subgroups. It also contains some information like the
percentage of trajectories contained in each group or some quality critetion (like the Calinski &
Harabasz).
kml3d
is a "hill-climbing" algorithm. The specificity of this
kind of algorithm is that it always converges towards a maximum, but
one cannot know whether it is a local or a global maximum. It offers
no guarantee of optimality.
To maximize one's chances of getting a quality Clustering
,
it is better to execute the hill climbing algorithm several times,
then to choose the best solution. By default, kml3d
executes the hill climbing algorithm 20 times
and chooses the Clustering
maximising the Calinski and Harabatz
criterion.
Likewise, it is not possible to know beforehand the optimum number of clusters.
On the other hand, afterwards, it is possible to calculate
clues that will enable us to choose.
In the end, kml3d
tests by default 2, 3, 4, 5 et 6 clusters, 20 times each.kml3d
has constructed some
Clustering
, the user can examine them one by one and choose
to export some. This can be done via function
choice
. choice
opens a graphic windows showing
various information including the trajectories cluterized by a specific
Clustering
.
When some Clustering
has been selected (the user can select
more than 1), it is possible to
save them. The clusters are therefore exported towards the file
nom-cluster.csv
. Criteria are exported towards
nom-criteres.csv
. The graphs are exported according to their
extension.kml3d
also propose tools to visualize the trajectories in
3D. plot3d
using the library rgl
to plot two
variables according to time (either the all set of trajectories, or
just the mean trajectories). Then the user can make the
graphical representation turn using the mouse. plot3dPdf
build an
Triangles
object. These kind of
object can be include in a pdf
file using
saveTrianglesAsASY
and the software
asymptote
. Once again, it is possible to make the image in the
pdf file move using the mouse -so the reader gets real 3D-.plot
), use:?(plot)
.plot
on argument LongData
), used: ?"plot,LongData"
.### 1. Data Preparation
myCld <- generateArtificialLongData(c(15,15,15))
### 2. Building "optimal" clusterization (with only 3 redrawings)
kml3d(myCld,nbRedrawing=3)
### 3. Exporting results
try(choice(myCld))
### 4. Visualizing in 3D
plot3d(myCld)
Run the code above in your browser using DataLab