Learn R Programming

MSCA (version 1.1.1)

get_cluster_sequences: Extract sequences of length k within clusters

Description

For each cluster, extract all sequence of length k from the ordered observations grouped by individual IDs. Returns a list of sequences per cluster.

Usage

get_cluster_sequences(
  dt,
  cl_col = "cl",
  id_col = "link_id",
  event_col = "reg",
  k = 2
)

Value

A named list of data frames, each containing sequences of length k observed in a given cluster.

Arguments

dt

A data.table or data.frame containing the data in a long format.

cl_col

Name of the column containing cluster labels.

id_col

Name of the column identifying individual trajectories (e.g. patient ID).

event_col

Name of the column containing ordered events (e.g. diagnoses, prescriptions).

k

Integer specifying the sequence length (recomended 2).

Author

Marc Delord

References

Delord M, Douiri A (2025) doi:10.1186/s12874-025-02476-7

See Also

cspade in the arulesSequences package for sequential pattern mining using the SPADE algorithm.