Learn R Programming

TraMineRextras (version 0.2.2)

seqemlt: Euclidean Metric for Longitudinal Timelines

Description

Computes a Euclidean distance between sequences. Transforms sequences such as distance beween sequences is equivalent to Euclidean distance between transformed sequences. The transformed sequences may be used as inputs of any Euclidean algorithm (clustering algorithms, ...). The distance is built considering the transitions between states at any step. A step weighing mechanism allows to balance short term/long term transitions. The background - the duality between distances between sequences and the evolution of the proximities between objects - is analysed.

Usage

seqemlt(seqdata, a = 1, b = 1, weighted = TRUE)

Arguments

seqdata
a state sequence object defined with the seqdef function.
a
optional argument for step weighing mechanism that controls the balancing between short term/long term transitions. The weighting function is $1/(a*s+b)$ where $s$ is the transition step.
b
see argument a.
weighted
optional numerical vector containing weights, which may be used by some functions to compute weighted statistics (rates of transitions).

Value

  • An object of class emlt with the following componants
  • coordtransformed sequences. Euclidean metric emlt between sequences is equivalent to Euclidean distance between coord. coord is the input of any clustering algorithms using a Euclidean metric
  • stateslist of states
  • situationslist of situations
  • sit.freqfrequence of situations
  • sit.transraterate of transitions from a situation to any situation of its own future : vector of transition towards future
  • sit.profilprofil of situations. The profil is a normalized vector issued from the rate of transition including a balance of short/long term with the weight of time 1/a*s+b, where s is the step of transition
  • sit.corCorrelation between situations. Two situations are high correlated when their profiles are similar (ie their transitions towards future are similar).

Details

The distance emlt between two sequences is the Euclidean distance between the transformed sequences coordinates. Using coord as the data input of any clustering algorithm using a Euclidean metric is equivalent of clustering with the emlt metric. A situation is defined as a state indexed with time, a sequence a timelines of states. The distance between situations is defined from the transitions between situations. The emlt distance between sequences takes into account the proximity between situations. Transitions are considered at any steps with a weighting balance between long/short terms. A situation may have no occurrence when the referring object is not present during all the duration. The distance between any situation and a situation with no occurrence is NA, and has no influence for the distance between sequences.

References

- Rousset Patrick, Giret Jean-françois,Classifying Qualitative Time Series with SOM: The Typology of Career Paths in France Lecture Notes in computer science, vol 4507, 2007, Springer Berlin / Heidelberg - - Rousset Patrick, Giret Jean-françois, Yvette Grelet (2012) Typologies De Parcours et Dynamique Longitudinale, Bulletin de méthodologie sociologique, issue 114, april 2012. - - Rousset Patrick, Giret Jean-françois (2008) A longitudinal Analysis of Labour Market Data with SOM, Encyclopedia of Artificial Intelligence, Edition Information Science Reference -

See Also

plot.emlt

Examples

Run this code
data(mvad)
mvad.seq <- seqdef(mvad[1:100, 17:41])
alphabet(mvad.seq)
head(labels(mvad.seq))
## Computing distance
mvad.emlt <- seqemlt(mvad.seq)

## typology1 with kmeans in 3 clusters
km <- kmeans(mvad.emlt$coord, 3)

##Plotting typology1 by clusters
seqdplot(mvad.seq, group=km$cluster)

## typology2 : with ward criterion in 3 clusters for large data: a two step kmeans-cluster
km<-kmeans(mvad.emlt$coord,25)
hc<-hclust(dist(km$centers, method="euclidean"), method="ward")
zz<-cutree(hc, k=3)

##Plotting typology2 by clusters

seqdplot(mvad.seq, group=zz[km$cluster])


## Plotting the evolution of the correlation between states
plot(mvad.emlt, from="employment", to="joblessness",type="cor")
plot(mvad.emlt, from=c("employment","HE", "school", "FE"), to="joblessness", delay=0, leg=TRUE)
plot(mvad.emlt, from="joblessness", to="employment", delay=6)
plot(mvad.emlt, type="pca", cex=0.4, compx=1, compy=2)

Run the code above in your browser using DataLab