fill_substr_info: Read Back a Saved Translation into a Substring Data Frame

Description

Reads a saved skeleton file (or character vector) and reconstructs the substring data frame that can be passed as fill to translate or skeleton. This allows resuming work on a translation that was saved earlier with writeLines.

The counterpart to this function is guess_substr_info, which fills the substring data frame from dictionaries. In contrast, fill_substr_info fills it from a skeleton that already contains manually entered types and translations.

Usage

fill_substr_info(skeleton)

Value

A data frame with \(N(N+1)/2\) rows (where \(N\) is the number of cuneiform tokens) and the following columns:

start: Integer. The 1-based position of the first token in the substring.
n_tokens: Integer. The number of tokens in the substring.
expr: Character. The concatenated cuneiform signs of the substring.
type: Character. The grammatical type (e.g. "S", "V", "Sx->V"), or "" if not yet specified.
translation: Character. The translation, or "" if not yet specified.

Rows without a corresponding skeleton entry have empty type and translation fields. The row order matches init_substr_info, so indices can be computed with substr_position.

Arguments

skeleton: A file path to a saved skeleton text file, or a character vector as returned by skeleton or translate.

Details

A typical workflow for translating Sumerian texts spans multiple sessions:

Call translate to interactively translate a line.
Save the result with writeLines(result, "Line_29.txt").
In a later session, call fill_substr_info("Line_29.txt") to reload the saved translation.
Pass the result as fill to translate to continue editing.

The function parses the skeleton format (header line "Structure: ..." followed by entry lines starting with |), extracts the type and translation from each entry, and places them at the correct positions in a substring data frame as created by init_substr_info.

Examples

Run this code

skeleton_file  <- system.file("extdata", "project/lines/Line_29.txt", package = "sumer")
the_skeleton   <- readLines(skeleton_file)

#Get the cuneiform text of the line:
x <- sub("^Structure: ", "", the_skeleton[1])
x

#See the whole file:
cat(the_skeleton, sep="\n")


df_fill <- fill_substr_info(skeleton_file)


if (FALSE) {

#Use the result of the function to revise the translation:

dict_file <- system.file("extdata", "sumer-dictionary.txt", package = "sumer")
text_file <- system.file("extdata", "project", "enki_and_the_world_order.txt", package = "sumer")

result <- translate(x,
                text = text_file,
                dic = dict_file,
                fill = df_fill,
                min_freq = c(6, 4, 2),
                sentence_prob = 0.25)
print(result)
# Now you may save the result with writeLines.
}