Sequence turbulence is a measure proposed by Elzinga
& Liefbroer (2007). It is based on the number $phi(x)$
of distinct subsequences that can be extracted from the distinct
successive state sequence and the variance of the consecutive times $t_i$
spent in the distinct states. For a sequence $x$, the formula is $$T(x)=\log_{2}(\phi(x)\,\frac{s_{t,max}^2(x) + 1}{s_t^2(x) + 1})$$
where $s_t^2(x)$ is the variance of the successive state
durations in sequence $x$ and $s_{t,max}^2(x)$ is the maximum
value that this variance can take given the total duration of the
sequence. This maximum is computed as
$$s_{t,max}^2 =(d-1)(1-\bar{t})^2$$
where $bar{t}$ is the mean consecutive time spent in the
distinct states, i.e. the sequence duration divided by the number
$d$ of distinct states in the sequence.
The function searches for missing states in the sequences and if found, adds the missing state to the alphabet for the computation of the turbulence. In this case the seqdss
and seqdur
functions for extracting the distinct successive state sequences and the associated durations are called with the {with.missing=TRUE}
argument. A missing state in a sequence is considered as the occurrence of an additional symbol of the alphabet, and two or more consecutive missing states are considered as two or more occurences of the same state. Hence the DSS of A-A-*-*-*-B-B-C-C-D
is A-*-B-C-D
and the associated durations are 2-3-2-2-1
.
The normalized value is obtained by subtracting 1 to the index and dividing by the turbulence value of a sequence made by repeating successively the alphabet up to the maximal length in seqdata
.