textreuse (version 0.1.2)
Detect Text Reuse and Document Similarity
Description
Tools for measuring similarity among documents and detecting
passages which have been reused. Implements shingled n-gram, skip n-gram,
and other tokenizers; similarity/dissimilarity functions; pairwise
comparisons; minhash and locality sensitive hashing algorithms; and a
version of the Smith-Waterman local alignment algorithm suitable for
natural language.