findExamples: Find all pairs/triples/... with corresponding sequences of sounds.

Description

Sift the dataset for word pairs/triples/... such that the first word in the first languages contains the first sequence, the one in the second language the second sequence, and so on.

Usage

findExamples(data, ..., distance.start, distance.end, na.value, zeros, cols)

Arguments

data

[soundcorrs] The dataset in which to look.

...

[character] Sequences for which to look. May be regular expressions as defined in R, or in the transcription. If an empty string, anything will be considered a match.

distance.start

[integer] The allowed distance between segments where the sound sequences begin. A negative value means alignment of the beginning of sequences will not be checked. Defaults to -1.

distance.end

[integer] The allowed distance between segments where the sound sequences end. A negative value means alignment of the end of sequences will not be checked. Defaults to -1.

na.value

[numeric] Treat NA<U+2019>s as matches (0) or non-matches (-1)? Defaults to 0.

zeros

[logical] Take linguistic zeros into account? Defaults to FALSE.

cols

[character vector] Which columns of the dataset to return as the result. Can be a vector of names, "aligned" (the two columns with segmented, aligned words), or "all" (all columns). Defaults to "aligned".

Value

[df.findExamples] A list with two fields: $data, a data frame with found examples; and $which, a logical vector showing which rows of data are considered matches.

Examples

Run this code

# NOT RUN {
# In the examples below, non-ASCII characters had to be escaped for technical reasons.
# In the actual usage, Unicode is supported under BSD, Linux, and macOS.
dataset <- sampleSoundCorrsData.capitals
# Find examples which have "a" in all three languages.
findExamples (dataset, "a", "a", "a")
# Find examples where German has schwa, and Polish and Spanish have a Vr sequence.
findExamples (dataset, "\u0259", "Vr", "Vr")
# Find examples where German has a-umlaut, Polish has a or e, and Spanish has any sound at all.
findExamples (dataset, "\u00E4", "[ae]", "")
# Find examples where German has a linguistic zero while Polish and Spanish do not.
findExamples (dataset, "-", "[^-]", "[^-]", zeros=TRUE)
# Find examples where German has schwa, and Polish and Spanish have a.
findExamples (dataset, "\u0259", "a", "a", distance.start=-1, distance.end=-1)
# As above, but the schwa and the two a's must be in the same segment.
findExamples (dataset, "\u0259", "a", "a", distance.start=0, distance.end=0)
# }

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples