lsh_subset

A data frame of candidate pairs from
<code><a rd-options="" href="/link/lsh_candidates?package=textreuse&version=0.1.5" data-mini-rdoc="textreuse::lsh_candidates">lsh_candidates</a></code>.

candidates

List of all candidates in a corpus

Tools for measuring similarity among documents and detecting
passages which have been reused. Implements shingled n-gram, skip n-gram,
and other tokenizers; similarity/dissimilarity functions; pairwise
comparisons; minhash and locality sensitive hashing algorithms; and a
version of the Smith-Waterman local alignment algorithm suitable for
natural language.

Lincoln Mullen

textreuse

Detect Text Reuse and Document Similarity

lsh_subset function

A data frame of candidate pairs from
<code><a rd-options='' href='lsh_candidates'>lsh_candidates</a></code>.

lsh_subset: List of all candidates in a corpus

Description

Usage

Arguments

Value

Examples