Preparation of data for soundcorrs
consists of segmentation and alignment. Segmentation can proceed on phoneme-by-phoneme, morpheme-by-morpheme, or any other basis; the only constraint is that each word in a pair/triple/... of words must contain the same number of segments. Segments are indicated by separators, by default the character "|"
. The action of inserting separators, potentially between every two letters, in a large dataset, can become time consuming. addSeparators
automates at least this part of the process.