For each cluster, extract all sequence of length k
from the ordered observations grouped by individual
IDs. Returns a list of sequences per cluster.
get_cluster_sequences(
dt,
cl_col = "cl",
id_col = "link_id",
event_col = "reg",
k = 2
)
A named list of data frames, each containing sequences of length k
observed in a given cluster.
A data.table
or data.frame containing the data in a long format.
Name of the column containing cluster labels.
Name of the column identifying individual trajectories (e.g. patient ID).
Name of the column containing ordered events (e.g. diagnoses, prescriptions).
Integer specifying the sequence length (recomended 2).
Marc Delord
Delord M, Douiri A (2025) doi:10.1186/s12874-025-02476-7
cspade
in the arulesSequences package for sequential pattern
mining using the SPADE algorithm.