tm v0.7-2


Monthly downloads



Text Mining Package

A framework for text mining applications within R.

Functions in tm

Name Description
Corpus Corpora
TextDocument Text Documents
Source Sources
DataframeSource Data Frame Source
PlainTextDocument Plain Text Documents
DirSource Directory Source
Docs Access Document IDs and Terms
SimpleCorpus Simple Corpora
Reader Readers
XMLSource XML Source
ZipSource ZIP File Source
PCorpus Permanent Corpora
VectorSource Vector Source
VCorpus Volatile Corpora
WeightFunction Weighting Function
crude 20 Exemplary News Articles from the Reuters-21578 Data Set of Topic crude
URISource Uniform Resource Identifier Source
acq 50 Exemplary News Articles from the Reuters-21578 Data Set of Topic acq
findAssocs Find Associations in a Term-Document Matrix
findMostFreqTerms Find Most Frequent Terms
tm_combine Combine Corpora, Documents, Term-Document Matrices, and Term Frequency Vectors
foreign Read Document-Term Matrices
content_transformer Content Transformers
Zipf_n_Heaps Explore Corpus Term Frequency Characteristics
readDataframe Read In a Text Document from a Data Frame
getTokenizers Tokenizers
readPDF Read In a PDF Document
getTransformations Transformations
readReut21578XML Read In a Reuters-21578 XML Document
removePunctuation Remove Punctuation Marks from a Text Document
readTagged Read In a POS-Tagged Word Text Document
removeSparseTerms Remove Sparse Terms from a Term-Document Matrix
stripWhitespace Strip Whitespace from a Text Document
termFreq Term Frequency Vector
findFreqTerms Find Frequent Terms
weightTfIdf Weight by Term Frequency - Inverse Document Frequency
plot Visualize a Term-Document Matrix
writeCorpus Write a Corpus to Disk
readDOC Read In a MS Word Document
weightSMART SMART Weightings
readXML Read In an XML Document
weightTf Weight by Term Frequency
stopwords Stopwords
removeNumbers Remove Numbers from a Text Document
tm_reduce Combine Transformations
tokenizer Tokenizers
tm_term_score Compute Score for Matching Terms
XMLTextDocument XML Text Documents
weightBin Weight Binary
hpc Parallelized ‘lapply’
readPlain Read In a Text Document
inspect Inspect Objects
readRCV1 Read In a Reuters Corpus Volume 1 Document
TermDocumentMatrix Term-Document Matrix
meta Metadata Management
stemDocument Stem Words
removeWords Remove Words from a Text Document
stemCompletion Complete Stems
tm_filter Filter and Index Functions on Corpora
tm_map Transformations on Corpora
No Results!

Vignettes of tm

No Results!

Last month downloads


Date 2017-11-17
LinkingTo BH, Rcpp
SystemRequirements C++11
License GPL-3
NeedsCompilation yes
Packaged 2017-11-18 17:20:14 UTC; hornik
Repository CRAN
Date/Publication 2017-11-18 17:23:28 UTC

Include our badge in your README