Learn R Programming

sumer (version 1.0.0)

make_dictionary: Create a Sumerian Dictionary from Annotated Text Files

Description

Parses Word documents (.docx) or plain text files containing annotated Sumerian translations and creates a structured dictionary data frame. The function extracts sign names, their cuneiform representations, possible readings, and translations with grammatical types.

Usage

make_dictionary(file, mapping = NULL)

Value

A data frame with the following columns:

sign_name

The normalized Sumerian sign name (e.g., "A", "AN", "ME")

line_type

Type of entry: "cunei." (cuneiform), "reading" (phonetic readings), or "trans." (translation)

count

Number of occurrences for translations; NA for cuneiform and reading entries

type

Grammatical type (e.g., "S", "V", "Sx->A") for translations; empty for other line types

meaning

The cuneiform character(s), reading(s), or translated meaning depending on line_type

Arguments

file

A character vector of file paths to .docx or text files. Files must contain translation lines that are formatted as described below.

mapping

A data frame containing sign-to-reading mappings with columns name, cuneiform and syllables. If NULL (default), the package's built-in mapping file etcsl_mapping.txt is used.

Details

Input Format

The input files must contain lines starting with | in the following format:

|sign_name: TYPE: meaning

or

|equation for sign_name: TYPE: meaning

For example:


|a2-tab: S: the double amount of work performance
|me=ME: S: divine force
|AN: S: god of heaven
|na=NA: Sx->A: whose existence is bound to S

Lines not starting with | are ignored. Only the first entry in an equation of sign names is used for the dictionary. The following notation is suggested for grammatical types:

  • S for substantives and noun phrases, (e.g., "the old man in the temple")

  • V for verbs and decorated verbs (e.g., "to go", "to bring the delivery into the temple")

  • A for adjectives, attributes and subordinate clauses that further define the subject (e.g., "who/which is weak", "whose resource for sustaining life is grain")

  • Sx->A for a symbol that transforms the preceding noun phrase into an attribute (e.g., "whose resource for sustaining life is S"). Other transformations are denoted accordingly.

  • N for numbers,

  • D for everything else.

Processing Steps

  1. Extracts text from .docx files or reads plain text

  2. Filters lines starting with |

  3. Normalizes sign names and looks up possible readings from the mapping table

  4. Aggregates translations and counts occurrences

Output Structure

For each unique sign, the output contains:

  • One cunei. row with the cuneiform character(s)

  • One reading row with possible phonetic readings

  • One or more trans. rows with translations, sorted by frequency

See Also

as.cuneiform, split_sumerian

Examples

Run this code

# Create a dictionary from a single text document
filename  <- system.file("extdata", "text_with_translations.txt", package = "sumer")
dict <- make_dictionary(filename)

# Use the dictionary
look_up("an", dict)

Run the code above in your browser using DataLab