convert_to_dictionary: Convert Translation Data to a Sumerian Dictionary

Description

Converts a data frame of Sumerian translations into a structured dictionary format, adding cuneiform representations and phonetic readings for each sign.

Usage

convert_to_dictionary(df, mapping = NULL)

Value

A data frame with the following columns:

sign_name: The normalized Sumerian text (e.g., "A", "AN", "A2.TAB")
row_type: Type of entry: "cunei." (cuneiform character), "reading" (phonetic readings), or "trans." (translation)
count: Number of occurrences for translations; NA for cuneiform and reading entries
type: Grammatical type (e.g., "S", "V", "A") for translations; empty string for other row types
meaning: The cuneiform character(s), phonetic reading(s), or translated meaning depending on row_type

The data frame is sorted by sign_name, row_type, and descending count.

Arguments

df: A data frame with columns sign_name, type, and meaning, typically produced by read_translated_text.
mapping: A data frame containing sign-to-reading mappings with columns name, cuneiform and syllables. If NULL (default), the package's built-in mapping file etcsl_mapping.txt is used.

Details

Processing Steps

Aggregates translations and counts occurrences of each unique combination in df
Looks up phonetic readings and cuneiform signs for each sign component
Combines cuneiform, reading, and translation rows into a single data frame
Sorts the result by sign name and row type

Reading Format

Phonetic readings are formatted as follows:

Multiple possible readings are enclosed in braces: {a, dur5, duru5}
For compound signs, readings of individual components are joined with hyphens
If a sign has more than three possible readings in a compound, only the first three are shown followed by ...
Unknown readings are marked with ?

Examples

Run this code

# Read translations from a single text document
filename     <- system.file("extdata", "text_with_translations.txt", package = "sumer")
translations <- read_translated_text(filename)

# View the structure
head(translations)

#Make some custom unifications (here: removing the word "the")
translations$meaning <- gsub("\\bthe\\b", "", translations$meaning, ignore.case = TRUE)
translations$meaning <- trimws(gsub("\\s+", " ", translations$meaning))

# View the structure
head(translations)

#Convert the result into a dictionary
dictionary   <- convert_to_dictionary(translations)

# View the structure
head(dictionary)

# View entries for a specific sign
dictionary[dictionary$sign_name == "EN", ]

# With custom mapping
path  <- system.file("extdata", "etcsl_mapping.txt", package = "sumer")
mapping <- read.csv2(path, sep=";", na.strings="")
translations <- read_translated_text(filename, mapping = mapping)
dictionary <- convert_to_dictionary(translations, mapping = mapping)
head(dictionary)

Run the code above in your browser using DataLab