Computes descriptive statistics for sequences, including sequence frequency for any sequence length, and conditional probability and relative risk for sequences of length 2 (pairwise transitions).
sequence_stats(
seq_list,
min_seq_freq = 0.01,
min_conditional_prob = 0,
min_relative_risk = 0
)
A list of data frames, each containing the sequence statistics for one cluster.
A list of data frames containing sequences, typically the output of get_cluster_sequences
.
Numeric threshold (default = 0.01). Filters out sequences with relative frequency below this value.
Numeric threshold (default = 0). Applies only for pairwise sequences (k = 2
).
Numeric threshold (default = 0). Applies only for pairwise sequences (k = 2
).
For k = 2
, the function computes:
seq_freq: Proportion of all sequences that match the pair
conditional_prob: P(to | from)
relative_risk: conditional probability divided by the marginal probability of to
For k > 2
, only seq_freq
is computed.
get_cluster_sequences