Learn R Programming

sumer (version 1.0.0)

split_sumerian: Split a String into Sumerian Signs and Separators

Description

Splits a transliterated Sumerian text string into its constituent signs and the separators between them. The function recognizes three types of Sumerian sign representations: lowercase transliterations, uppercase sign names, and Unicode cuneiform characters.

Usage

split_sumerian(x)

Value

A list with three components:

signs

A character vector containing the extracted Sumerian signs.

separators

A character vector of length length(signs) + 1 containing the separators. The first element contains any text before the first sign, subsequent elements contain text between consecutive signs, and the last element contains any text after the final sign. Empty strings indicate no separator at that position.

types

An integer vector of the same length as signs indicating the type of each sign: 1 for lowercase transliterations, 2 for uppercase sign names, and 3 for cuneiform characters.

Arguments

x

A character string containing transliterated Sumerian text.

Details

The function identifies Sumerian signs based on three patterns:

  1. Lowercase transliterations (type 1): Sequences of lowercase letters (a-z) including special characters (ĝ, š, ...) and accented vowels (á, é, í, ú, à, è, ì, ù), optionally followed by a numeric index.

  2. Uppercase sign names (type 2): Sequences starting with an uppercase letter, optionally followed by additional uppercase letters, digits, or the characters +, /, and ×.

  3. Cuneiform characters (type 3): Unicode characters in the Cuneiform block (U+12000 to U+12500).

The function returns the signs and separators in a format that allows exact reconstruction of the original string using paste0(c("", signs), separators, collapse = "").

Examples

Run this code

# Example 1
x <- "en-tarah-an-na-ke4"

result <- split_sumerian(x)

result

# Example 2

x <- "en-DARA3.AN.na-ke4"

result <- split_sumerian(x)

result

# Reconstruct the original string
paste0(c("", result$signs), result$separators, collapse = "")

Run the code above in your browser using DataLab