pass_align: Transfer alignment from one string to another

Description

In the alignment of linguistic strings, it is often better to perform the alignment on a simplified string. This function allows to pass back the alignment from the simplified string to the original

Usage

pass_align(originals, alignment, sep = " ", in.gap = "-", out.gap = "-")

Arguments

originals

Vector of strings in the original form, with separators

alignment

Vector of simplified strings after alignment, with separators and gaps. The number of non-gap parts should match the number of parts of the originals

sep

Symbol used as separator between parts of the strings

in.gap

Symbol used as gap indicator in the alignments

out.gap

Symbol used as gap indicator in the output. This is useful when the gap symbol from the alignments occurs as character in the originals .

Value

Vector of original strings with the gaps inserted from the aligned strings.

Details

Given some strings, a sound (or graphemic) alignment inserts gaps into the strings in such a way as to align the columns between different strings. We assume here an original string that is separated by sep into parts (segments, sounds, tailored grapheme clusters). After simplification (e.g. through tokenize) and alignment (currently using non-R software) a string is retuned with extra gaps inserted. The number of non-gap parts should match the original string.

Examples

Run this code

# NOT RUN {
# make some strings with separators
l <- list(letters[1:3], letters[4:7], letters[10:15])
originals <- sapply(l, paste, collapse = " ")
cbind(originals)

# make some alignment
# note that this alignment is non-sensical!
alignment <- c("X - - - X - X", "X X - - - X X", "X X X - X X X")
cbind(alignment)

# match originals to the alignment
transferred <- pass_align(originals, alignment)
cbind(transferred)

# ========

# a slighly more interesting example
# using the bare-bones pairwise alignment from adist()
originals <- c("cute kitten class","utter tentacles")
cbind(originals)

# adist returns strings of pairwise Levenshtein operations
# "I" signals insertion
(levenshtein <- attr(adist(originals, counts = TRUE), "trafos"))

# pass alignments to original strings, show the insertions as "-" gaps
alignment <- c(levenshtein[1,2], levenshtein[2,1])
transferred <- pass_align(originals, alignment, 
    sep = "", in.gap = "I", out.gap = "-")
cbind(transferred)

# }

Run the code above in your browser using DataLab