Learn R Programming

⚠️There's a newer version (0.1.5) of this package.Take me there.

textreuse (version 0.1.2)

Detect Text Reuse and Document Similarity

Description

Tools for measuring similarity among documents and detecting passages which have been reused. Implements shingled n-gram, skip n-gram, and other tokenizers; similarity/dissimilarity functions; pairwise comparisons; minhash and locality sensitive hashing algorithms; and a version of the Smith-Waterman local alignment algorithm suitable for natural language.

Copy Link

Version

Install

install.packages('textreuse')

Monthly Downloads

833

Version

0.1.2

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Repository

https://github.com/ropensci/textreuse

Maintainer

Lincoln Mullen

Last Published

November 6th, 2015

Functions in textreuse (0.1.2)

Hash a string to an integer

TextReuseCorpus

TextReuseCorpus

TextReuseTextDocument-accessors

Accessors for TextReuse objects

as.matrix.textreuse_candidates

Convert candidates data frames to other formats

Locality sensitive hashing for minhash

TextReuseTextDocument

TextReuseTextDocument

Query a LSH cache for matches to a single document

Local alignment of natural language texts

lsh_probability

Probability that a candidate pair will be detected with LSH

pairwise_candidates

Candidate pairs from pairwise comparisons

Objects exported from other packages

Recompute the tokens for a document or corpus

pairwise_compare

Pairwise comparisons among documents in a corpus

Compare candidates identified by LSH

Candidate pairs from LSH comparisons

Split texts into tokens

Recompute the hashes for a document or corpus

Filenames from paths

minhash_generator

Generate a minhash function

List of all candidates in a corpus

textreuse-package

Detect Text Reuse and Document Similarity

similarity-functions

Measure similarity/dissimilarity in documents