Learn R Programming

⚠️There's a newer version (0.1.5) of this package.Take me there.

textreuse (version 0.1.1)

Detect Text Reuse and Document Similarity

Description

Tools for measuring similarity among documents and detecting passages which have been reused. Implements shingled n-gram, skip n-gram, and other tokenizers; similarity/dissimilarity functions; pairwise comparisons; minhash and locality sensitive hashing algorithms; and a version of the Smith-Waterman local alignment algorithm suitable for natural language.

Copy Link

Version

Install

install.packages('textreuse')

Monthly Downloads

833

Version

0.1.1

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Repository

https://github.com/ropensci/textreuse

Maintainer

Lincoln Mullen

Last Published

November 4th, 2015

Functions in textreuse (0.1.1)

Filenames from paths

Query a LSH cache for matches to a single document

pairwise_candidates

Candidate pairs from pairwise comparisons

Recompute the hashes for a document or corpus

as.matrix.textreuse_candidates

Convert candidates data frames to other formats

textreuse-package

Detect Text Reuse and Document Similarity

pairwise_compare

Pairwise comparisons among documents in a corpus

List of all candidates in a corpus

Locality sensitive hashing for minhash

Objects exported from other packages

Hash a string to an integer

TextReuseTextDocument-accessors

Accessors for TextReuse objects

lsh_probability

Probability that a candidate pair will be detected with LSH

Candidate pairs from LSH comparisons

TextReuseCorpus

TextReuseCorpus

Local alignment of natural language texts

similarity-functions

Measure similarity/dissimilarity in documents

Compare candidates identified by LSH

TextReuseTextDocument

TextReuseTextDocument

Recompute the tokens for a document or corpus

minhash_generator

Generate a minhash function

Split texts into tokens