mark_skeleton_entries: Normalize Brackets for Skeleton Generation

Description

Transforms a transliterated Sumerian text string into a normalized form that contains only round brackets. This prepares the input for hierarchical extraction by extract_skeleton_entries.

This is an internal helper function used by skeleton.

Usage

mark_skeleton_entries(x)

Value

A character string of length 1 in which all tokens are enclosed in round brackets. Angle brackets are removed; curly braces from the input are preserved.

Arguments

x: A character string of length 1 containing transliterated Sumerian text. The string may contain angle brackets (< >), round brackets (( )), and curly braces ({ }) to annotate token groups (see Details).

Details

The function performs the following transformations:

Tokenizes the input using an internal helper function. Tokens enclosed in angle brackets are merged into a single token.
Removes angle brackets from the separators, replacing them with spaces. Curly braces are preserved.
Wraps every token that is not already enclosed in round brackets with round brackets.

The result is a string in which every token is enclosed in round brackets. Existing round brackets from the input are preserved, so the nesting structure reflects the grouping specified in the original input.

For example, the input

"<d-nu-dim2-mud> ki a. jal2 (e2{kur}) ra"

is transformed into a string where d-nu-dim2-mud appears as a single bracketed token, e2 and kur are individually bracketed inside the existing round brackets around e2{kur}, and all other tokens (ki, a, jal2, ra) are each wrapped in their own round brackets.

Examples

Run this code

# Input with all three bracket types
x <- " ki a. jal2 (e2{kur}) ra. gaba jal2. an ki a"
sumer:::mark_skeleton_entries(x)

# Input without any brackets: each token gets wrapped in round brackets
sumer:::mark_skeleton_entries("LUGAL.E")

# Angle brackets merge tokens into a single unit
sumer:::mark_skeleton_entries(" lugal")