Learn R Programming

MSCA (version 1.1.1)

sequence_stats: Compute sequence statistics

Description

Computes descriptive statistics for sequences, including sequence frequency for any sequence length, and conditional probability and relative risk for sequences of length 2 (pairwise transitions).

Usage

sequence_stats(
  seq_list,
  min_seq_freq = 0.01,
  min_conditional_prob = 0,
  min_relative_risk = 0
)

Value

A list of data frames, each containing the sequence statistics for one cluster.

Arguments

seq_list

A list of data frames containing sequences, typically the output of get_cluster_sequences.

min_seq_freq

Numeric threshold (default = 0.01). Filters out sequences with relative frequency below this value.

min_conditional_prob

Numeric threshold (default = 0). Applies only for pairwise sequences (k = 2).

min_relative_risk

Numeric threshold (default = 0). Applies only for pairwise sequences (k = 2).

Details

For k = 2, the function computes:

  • seq_freq: Proportion of all sequences that match the pair

  • conditional_prob: P(to | from)

  • relative_risk: conditional probability divided by the marginal probability of to

For k > 2, only seq_freq is computed.

See Also

get_cluster_sequences