findPairs: Find all pairs with corresponding sequences of sounds.

Description

Sift the dataset for word pairs such that the first word contains x and the second word contains y in the corresponding segment or segments.

Usage

findPairs(data, x, y, exact, cols)

Arguments

data

[soundcorrs] The dataset in which to look. Only datasets with two languages are supported.

[character] The sequence to find in language1. May be a regular expression. If an empty string, anything will be considered a match.

[character] The sequence to find in language2. May be a regular expression. If an empty string, anything will be considered a match.

exact

[logical] Only return exact, full-segment to full-segment matches? If TRUE, linguistic zeros are not ignored. Defaults to FALSE.

cols

[character vector] Which columns of the dataset to return as the result. Can be a vector of names, "aligned" (the two columns with segmented, aligned words), or "all" (all columns). Defaults to "aligned".

Value

[df.findPairs] A subset of the dataset, containing only the pairs with corresponding sequences. Warning: pairs with multiple occurrences of such sequences are only included once.

Examples

Run this code

# NOT RUN {
# In the examples below, non-ASCII characters had to be escaped for technical reasons.
# In actual usage, all soundcorrs functions accept characters from beyond ASCII.
dataset <- sampleSoundCorrsData.capitals
findPairs (dataset, "\u00E4", "e", cols=c("ORTHOGRAPHY.German","ORTHOGRAPHY.Polish"))  # a-diaeresis
findPairs (dataset, "a", "[ae]", cols="all")
findPairs (dataset, "\u0259", "Vr", exact=FALSE)  # schwa
findPairs (dataset, "\u0259", "Vr", exact=TRUE)  # schwa
subset (dataset, findPairs(dataset, "\u00E4", "e")$which)  # a-diaeresis
# }

Run the code above in your browser using DataLab