seqici: Complexity index of individual sequences

Description

Computes the complexity index, a composite measure of sequence complexity. The index uses the number of transitions in the sequence as a measure of the complexity induced by the state ordering and the longitudinal entropy as a measure of the complexity induced by the state distribution in the sequence.

Usage

seqici(seqdata, with.missing=FALSE)

Arguments

seqdata

a sequence object as returned by the the seqdef function.

with.missing

if set to TRUE, missing status (gaps in sequences) is handled as an additional state when computing the state distribution and the number of transitions in the sequence.

Value

a vector of length equal to the number of sequences in seqdata containing the complexity index value of each sequence.

encoding

latin1

Details

The complexity index $C(s)$ of a sequence $s$ is $$C(s)= \sqrt{\frac{q(s)}{q_{max}} \,\frac{h(s)}{h_{max}}}$$ where $q(s)$ is the number of transitions in the sequence, $q_{max}$ the maximum number of transitions, $h(s)$ the within entropy, and $h_{max}$ the theoretical maximum entropy which is $h_{max} = -\log 1/|A|$. The index $C(s)$ is the geometric mean of its two components which are normalized. The minimum value of 0 can only be reached by a sequence made of one distinct state, containing thus 0 transitions and having an entropy of 0. The maximum 1 of $C(s)$ is reached when the two following conditions are fulfilled: i) Each of the state in the alphabet is present in the sequence and the total durations are uniform, that is, equal to $\ell/a$ and ii) The number of transitions in the sequence is equal to $\ell-1$, that is, the length $\ell_d$ of the DSS is equal to the length of the sequence $\ell$

References

Gabadinho, A., G. Ritschard, N. S. M�ller and M. Studer (2011). Analyzing and Visualizing State Sequences in R with TraMineR. Journal of Statistical Software 40(4), 1-37. Gabadinho, A., Ritschard, G., Studer, M. and M�ller, N.S. (2010). "Indice de complexit� pour le tri et la comparaison de s�quences cat�gorielles", In Extraction et gestion des connaissances (EGC 2010), Revue des nouvelles technologies de l'information RNTI. Vol. E-19, pp. 61-66.

Examples

Run this code

## Creating a sequence object from the mvad data set
data(mvad)
mvad.labels <- c("employment", "further education", "higher education",
                    "joblessness", "school", "training")
mvad.scodes <- c("EM","FE","HE","JL","SC","TR")
mvad.seq <- seqdef(mvad, 15:86, states=mvad.scodes, labels=mvad.labels)

##
mvad.ci <- seqici(mvad.seq)
summary(mvad.ci)
hist(mvad.ci)

## Example using with.missing argument
data(ex1)
ex1.seq <- seqdef(ex1, 1:13)
seqici(ex1.seq)
seqici(ex1.seq, with.missing=TRUE)

Run the code above in your browser using DataLab