textreuse (version 0.1.5)

pairwise_compare: Pairwise comparisons among documents in a corpus

Description

Given a TextReuseCorpus containing documents of class TextReuseTextDocument, this function applies a comparison function to every pairing of documents, and returns a matrix with the comparison scores.

Usage

pairwise_compare(corpus, f, ..., directional = FALSE, progress = interactive())

Arguments

f

The function to apply to x and y.

...

Additional arguments passed to f.

directional

Some comparison functions are commutative, so that f(a, b) == f(b, a) (e.g., jaccard_similarity). Other functions are directional, so that f(a, b) measures a's borrowing from b, which may not be the same as f(b, a) (e.g., ratio_of_matches). If directional is FALSE, then only the minimum number of comparisons will be made, i.e., the upper triangle of the matrix. If directional is TRUE, then both directional comparisons will be measured. In no case, however, will documents be compared to themselves, i.e., the diagonal of the matrix.

progress

Display a progress bar while comparing documents.

Value

A square matrix with dimensions equal to the length of the corpus, and row and column names set by the names of the documents in the corpus. A value of NA in the matrix indicates that a comparison was not made. In cases of directional comparisons, then the comparison reported is f(row, column).

See Also

See these document comparison functions, jaccard_similarity, ratio_of_matches.

Examples

Run this code
# NOT RUN {
dir <- system.file("extdata/legal", package = "textreuse")
corpus <- TextReuseCorpus(dir = dir)
names(corpus) <- filenames(names(corpus))

# A non-directional comparison
pairwise_compare(corpus, jaccard_similarity)

# A directional comparison
pairwise_compare(corpus, ratio_of_matches, directional = TRUE)
# }

Run the code above in your browser using DataLab