tm v0.6-2


Monthly downloads



by Ingo Feinerer

Text Mining Package

A framework for text mining applications within R.

Functions in tm

Name Description
ZipSource ZIP File Source
TermDocumentMatrix Term-Document Matrix
tm_filter Filter and Index Functions on Corpora
Reader Readers
readPDF Read In a PDF Document
readXML Read In an XML Document
readDOC Read In a MS Word Document
PCorpus Permanent Corpora
DirSource Directory Source
removeWords Remove Words from a Text Document
meta Metadata Management
weightTfIdf Weight by Term Frequency - Inverse Document Frequency
Docs Access Document IDs and Terms
tm_reduce Combine Transformations
tm_term_score Compute Score for Matching Terms
XMLTextDocument XML Text Documents
readTabular Read In a Text Document
removeSparseTerms Remove Sparse Terms from a Term-Document Matrix
Zipf_n_Heaps Explore Corpus Term Frequency Characteristics
readTagged Read In a POS-Tagged Word Text Document
VectorSource Vector Source
PlainTextDocument Plain Text Documents
termFreq Term Frequency Vector
WeightFunction Weighting Function
acq 50 Exemplary News Articles from the Reuters-21578 Data Set of Topic acq
readRCV1 Read In a Reuters Corpus Volume 1 Document
URISource Uniform Resource Identifier Source
findFreqTerms Find Frequent Terms
weightTf Weight by Term Frequency
getTransformations Transformations
removeNumbers Remove Numbers from a Text Document
stripWhitespace Strip Whitespace from a Text Document
plot Visualize a Term-Document Matrix
foreign Read Document-Term Matrices
TextDocument Text Documents
stopwords Stopwords
crude 20 Exemplary News Articles from the Reuters-21578 Data Set of Topic crude
VCorpus Volatile Corpora
content_transformer Content Transformers
Source Sources
weightSMART SMART Weightings
readReut21578XML Read In a Reuters-21578 XML Document
writeCorpus Write a Corpus to Disk
tokenizer Tokenizers
DataframeSource Data Frame Source
inspect Inspect Objects
findAssocs Find Associations in a Term-Document Matrix
stemDocument Stem Words
tm_map Transformations on Corpora
Corpus Corpora
tm_combine Combine Corpora, Documents, Term-Document Matrices, and Term Frequency Vectors
stemCompletion Complete Stems
getTokenizers Tokenizers
removePunctuation Remove Punctuation Marks from a Text Document
readPlain Read In a Text Document
XMLSource XML Source
weightBin Weight Binary
No Results!

Last month downloads


Date 2015-07-02
SystemRequirements Antiword () for reading MS Word files, pdfinfo and pdftotext from Poppler () for reading PDF
License GPL-3
NeedsCompilation yes
Packaged 2015-07-03 06:57:05 UTC; hornik
Repository CRAN
Date/Publication 2015-07-03 10:43:07

Include our badge in your README