powered by
Match cheaters
catch_em(flist, n_grams = 10, time_lim = 1L, progress_bar = TRUE)
a list of documents (.doc/.docx/.pdf). A full/relative path must be provided.
.doc
.docx
.pdf
see ngram package.
ngram
max time in seconds for each comparison. Defult is 1 second, had no problem comparing documents with 50K words.
Should a progress bar be printed to the console?
A correlation matrix of class chtrs with each cell indicating the match (0-1) between two of the documents.
chtrs
# NOT RUN { if (interactive()) { files <- choose.files() catch_em(files) } # }
Run the code above in your browser using DataLab