Parses Word documents (.docx) or plain text files containing annotated Sumerian translations and creates a structured dictionary data frame. The function extracts sign names, their cuneiform representations, possible readings, and translations with grammatical types.
make_dictionary(file, mapping = NULL)A data frame with the following columns:
The normalized Sumerian sign name (e.g., "A", "AN", "ME")
Type of entry: "cunei." (cuneiform), "reading" (phonetic readings), or "trans." (translation)
Number of occurrences for translations; NA for cuneiform and reading entries
Grammatical type (e.g., "S", "V", "Sx->A") for translations; empty for other line types
The cuneiform character(s), reading(s), or translated meaning depending on line_type
A character vector of file paths to .docx or text files. Files must contain translation lines that are formatted as described below.
A data frame containing sign-to-reading mappings with columns
name, cuneiform and syllables. If NULL (default), the package's built-in
mapping file etcsl_mapping.txt is used.
The input files must contain lines starting with | in the following format:
|sign_name: TYPE: meaning
or
|equation for sign_name: TYPE: meaning
For example:
|a2-tab: S: the double amount of work performance
|me=ME: S: divine force
|AN: S: god of heaven
|na=NA: Sx->A: whose existence is bound to S
Lines not starting with | are ignored. Only the first entry in an equation of sign names is used for the dictionary. The following notation is suggested for grammatical types:
S for substantives and noun phrases, (e.g., "the old man in the temple")
V for verbs and decorated verbs (e.g., "to go", "to bring the delivery into the temple")
A for adjectives, attributes and subordinate clauses that further define the subject (e.g., "who/which is weak", "whose resource for sustaining life is grain")
Sx->A for a symbol that transforms the preceding noun phrase into an attribute (e.g., "whose resource for sustaining life is S"). Other transformations are denoted accordingly.
N for numbers,
D for everything else.
Extracts text from .docx files or reads plain text
Filters lines starting with |
Normalizes sign names and looks up possible readings from the mapping table
Aggregates translations and counts occurrences
For each unique sign, the output contains:
One cunei. row with the cuneiform character(s)
One reading row with possible phonetic readings
One or more trans. rows with translations, sorted by frequency
as.cuneiform, split_sumerian
# Create a dictionary from a single text document
filename <- system.file("extdata", "text_with_translations.txt", package = "sumer")
dict <- make_dictionary(filename)
# Use the dictionary
look_up("an", dict)
Run the code above in your browser using DataLab