Learn R Programming

codyna (version 0.1.0)

discover_patterns: Discover Sequence Patterns

Description

Discovering various types of patterns in sequence data. Provides n-gram extraction, gapped pattern discovery, analysis of repeated patterns and targeted pattern search.

Usage

discover_patterns(
  data,
  type = "ngram",
  pattern,
  len = 2:5,
  gap = 1:3,
  min_support = 0.01,
  min_count = 2,
  start,
  end,
  contains
)

Value

A tibble containing the discover patterns, counts, proportions, and support.

Arguments

data

[data.frame, matrix, stslist]
Sequence data in wide format (rows are sequences, columns are time points).

type

[character(1)]
Type of pattern analysis:

  • "ngram": Extract contiguous n-grams.

  • "gapped": Discover patterns with gaps/wildcards.

  • "repeated": Detect repeated occurrences of the same state.

pattern

[character(1)]
Specific pattern to search for as a character string (e.g., "A->*->B"). If provided, type is ignored. Supports wildcards: * (single) and ** (multi-wildcard).

len

[integer()]
Pattern lengths to consider for n-grams and repeated patterns (default: 2:5).

gap

[integer()]
Gap sizes to consider for gapped patterns (default: 1:3).

min_support

[integer(1)]
Minimum support threshold, i.e., the proportion of sequences that must contain a specific pattern for it to be included (default: 0.01).

min_count

[integer(1)]
Minimum count threshold, i.e., the numbers of times a pattern must occur across all sequences for it to be included (default: 2).

start

[character(1)]
Filter patterns starting with these states.

end

[character(1)]
Filter patterns ending with these states.

contains

[character(1)]
Filter patterns containing these states.

Examples

Run this code
# N-grams
ngrams <- discover_patterns(engagement, type = "ngram")

# Gapped patterns
gapped <- discover_patterns(engagement, type = "gapped")

# Repeated patterns
repeated <- discover_patterns(engagement, type = "repeated")

# Custom pattern with a wildcard state
custom <- discover_patterns(engagement, pattern = "Active->*")

Run the code above in your browser using DataLab