soundcorrs
is the fundamental class of the entire soundcorrs package, and it is required for most tasks that the package promises to make easier and faster than manual labour. A soundcorrs
object is a list containing the original data frame, some metadata (names of languages, names of columns, transcriptions), as well as transformations of the original data for faster processing in findExamples
and other functions (words exploded into individual segments, with segment separators removed, etc.). The basic unit in soundcorrs
is a pair/triple/... of words, each of which is assigned to a specific language.
This constructor function is not really intended for the end user. Whenever possible, read.soundcorrs
should be used instead. Regardless of the function used, two pieces of information are required for each word: the language it comes from, and its segmented and aligned form. Segmentation means that the word is cut into parts which can represent phonemes, morphemes, or anything else (the default separator is a vertical bar, "|"
). A word with no separators in it is considered one big segment, and in fact, for soundchange
's this is enough. Alignment means that each word in a pair/triple/... has the same number of segments, and that those segments are in the corresponding places. Often, one of the words in a pair/triple/... will naturally have fewer segments than the others; in such cases, a filler character, 'linguistic zero' needs to be used ("-"
is a good choice); for example, to align the Spanish and Swedish names for 'Stockholm', a total of three such 'empty' segments is required: e|s|t|o|k|-|o|l|m|o : -|s|t|o|k|k|o|l|m|-. Linguistic zero must be defined in the transcription
.
Typically, a soundcorrs
object will be used to hold an entire list of pairs/triples/... of words from various languages. However, both this constructor function and read.soundcorrs
can only read data from one language at a time. This is because each language requires relatively many pieces of metadata (name, column names, transcription), and if all of this information for multiple languages were to be passed as arguments to one function, the call would very quickly become illegible. Multiple soundcorrs
objects can be merged into one using merge.soundcorrs
.
Three sample datasets are available: data-abc
, data-capitals
, and data-ie
; they can be loaded with the help of loadSampleDataset
.