extract_skeleton_entries: Extract Hierarchical Skeleton Entries from Bracketed Text

Description

Recursively extracts the contents of nested round brackets from a normalized Sumerian text string and returns them as a data frame with position, length, nesting depth, and expression for each entry.

This is an internal helper function used by skeleton.

Usage

extract_skeleton_entries(x)

Value

A data frame with one row per extracted entry and the following columns:

start: Integer. The token position of the first token in the group (1-based).
n_tokens: Integer. The number of Sumerian tokens (signs) in the group.
depth: Integer. The nesting depth of the entry (0 for the root entry representing the full expression, 1 for top-level groups, 2 for groups nested one level deeper, etc.).
expr: Character. The text content of the bracket group (without the surrounding brackets). For the root entry (row 1), this is the full input string.

The result always has at least one row (the root entry).

Arguments

x: A character string containing Sumerian text with round brackets, as returned by mark_skeleton_entries.

Details

The first row of the result always represents the entire input expression at depth 0 (the root entry). The function then extracts the contents of all outermost (top-level) bracket pairs using an internal helper function. For each extracted group, a row is added to the result data frame at depth 1. If a group itself contains further nested brackets, the function recurses into it to extract deeper levels.

The depth value of each entry reflects the nesting level: the root entry has depth 0, entries from the outermost brackets have depth 1, entries nested one level deeper have depth 2, and so on.

The start column records the position (in tokens) of the first token in each group, relative to the full input. The n_tokens column gives the number of tokens in the group as determined by split_sumerian.

Examples

Run this code

# First normalize the input with mark_skeleton_entries
x <- " ki a. jal2 (e2{kur}) ra. gaba jal2. an ki a"
normalized <- sumer:::mark_skeleton_entries(x)
normalized

# Then extract the hierarchical structure
sumer:::extract_skeleton_entries(normalized)