skeleton: Create a Translation Template for Sumerian Text

Description

Creates a structured template (skeleton) for translating Sumerian text. The template displays each token and subexpression with its syllabic reading, sign name, and cuneiform representation, providing a framework for adding translations.

The input may contain three types of brackets to control how the template is generated (see Details). Optionally, the template can be pre-filled with translations from one or more dictionaries using guess_substr_info.

The function skeleton computes the template and returns an object of class "skeleton". The print method displays the template in the console.

Usage

skeleton(x, mapping = NULL, fill = NULL, space = FALSE)
# S3 method for skeleton
print(x, ...)

Value

skeleton returns a character vector of class c("skeleton", "character") containing the template lines. The first line is the header with the full reading of the input, followed by one line per skeleton entry. If space = TRUE, empty strings are inserted as separator lines.

print.skeleton prints the template to the console (one line per element) and returns x invisibly.

Arguments

x

For skeleton: A character string of length 1 containing transliterated Sumerian text (transliteration, sign names, or cuneiform characters). Tokens may be grouped with brackets to control template generation (see Details).

For print.skeleton: An object of class "skeleton" as returned by skeleton.

mapping

A data frame containing the sign mapping table with columns syllables, name, and cuneiform. If NULL (the default), the package's internal mapping file etcsl_mapping.txt is loaded.

fill

A data frame as returned by guess_substr_info, containing translations and grammatical types for all substrings of x. If provided, the template lines are pre-filled with the corresponding type and translation. If NULL (the default), the template lines are left empty.

space

Logical. If TRUE, an empty line is inserted before each entry at nesting depth 1, visually separating top-level groups. Defaults to FALSE.

...

Additional arguments passed to the print method (currently unused).

Details

The function generates a hierarchical template from a Sumerian text string. The input is first converted to cuneiform with as.cuneiform. The input string may contain three types of brackets that control how entries in the template are generated:

Angle brackets < >: The enclosed token sequence is treated as a fixed term. No individual skeleton entries are generated for the tokens inside. For example, <d-nu-dim2-mud> is treated as a single unit.
Round brackets ( ): The enclosed token sequence is a coherent term for which a single skeleton entry is generated, in addition to entries for its individual tokens. Nesting is allowed.
Curly braces { }: Ignored during skeleton generation. They can be used in the input to indicate which tokens serve as arguments to an operator, but this information is not needed for the skeleton.

In addition, a skeleton entry is generated for every individual token that does not appear inside angle brackets.

Each line in the resulting template follows the format:

|[tabs]reading=SIGN.NAME=cuneiform:type:translation

When fill is not provided, the type and translation fields are left empty:

|[tabs]reading=SIGN.NAME=cuneiform::

The template should then be filled in as follows:

Between the two colons: the grammatical type of the expression (e.g., S for noun phrases, V for verbs). See make_dictionary for details.
After the second colon: the translation.

The indentation level (number of tabs) reflects the nesting depth: top-level entries have no indentation, their sub-entries have one tab, and so on.

The template format is designed to be saved as a text file (.txt) or Word document (.docx), edited manually, and then used as input for make_dictionary to create a custom dictionary.

If fill is provided, the function validates that fill matches x: the cuneiform tokens of the first row in fill must be identical to the tokens of x, and the number of rows must equal \(N(N+1)/2\) where \(N\) is the number of tokens.

Examples

Run this code

# Create an empty template
x <- " ki a. jal2 (e2{kur}) ra. gaba jal2. an ki a"
skeleton(x)

# Pre-fill the template with dictionary translations
dic <- read_dictionary()
fill <- guess_substr_info(x, dic)
skeleton(x, fill = fill)

# Use spacing to visually separate top-level groups
skeleton(x, fill = fill, space = TRUE)