Text Mining Package

A framework for text mining applications within R.

Functions in tm

Name Description
Docs Access Document IDs and Terms
SimpleCorpus Simple Corpora
Corpus Corpora
TextDocument Text Documents
DataframeSource Data Frame Source
Reader Readers
Source Sources
PCorpus Permanent Corpora
URISource Uniform Resource Identifier Source
PlainTextDocument Plain Text Documents
DirSource Directory Source
VCorpus Volatile Corpora
VectorSource Vector Source
findAssocs Find Associations in a Term-Document Matrix
WeightFunction Weighting Function
findFreqTerms Find Frequent Terms
findMostFreqTerms Find Most Frequent Terms
foreign Read Document-Term Matrices
acq 50 Exemplary News Articles from the Reuters-21578 Data Set of Topic acq
readPlain Read In a Text Document
tm_combine Combine Corpora, Documents, Term-Document Matrices, and Term Frequency Vectors
readRCV1 Read In a Reuters Corpus Volume 1 Document
ZipSource ZIP File Source
content_transformer Content Transformers
Zipf_n_Heaps Explore Corpus Term Frequency Characteristics
crude 20 Exemplary News Articles from the Reuters-21578 Data Set of Topic crude
readDataframe Read In a Text Document from a Data Frame
hpc Parallelized ‘lapply’
readPDF Read In a PDF Document
readXML Read In an XML Document
inspect Inspect Objects
removeNumbers Remove Numbers from a Text Document
weightSMART SMART Weightings
plot Visualize a Term-Document Matrix
readDOC Read In a MS Word Document
XMLSource XML Source
removePunctuation Remove Punctuation Marks from a Text Document
XMLTextDocument XML Text Documents
tokenizer Tokenizers
getTokenizers Tokenizers
weightBin Weight Binary
removeWords Remove Words from a Text Document
getTransformations Transformations
stemCompletion Complete Stems
readReut21578XML Read In a Reuters-21578 XML Document
tm_filter Filter and Index Functions on Corpora
tm_map Transformations on Corpora
weightTf Weight by Term Frequency
removeSparseTerms Remove Sparse Terms from a Term-Document Matrix
TermDocumentMatrix Term-Document Matrix
readTagged Read In a POS-Tagged Word Text Document
meta Metadata Management
stemDocument Stem Words
stopwords Stopwords
termFreq Term Frequency Vector
tm_reduce Combine Transformations
weightTfIdf Weight by Term Frequency - Inverse Document Frequency
tm_term_score Compute Score for Matching Terms
writeCorpus Write a Corpus to Disk
stripWhitespace Strip Whitespace from a Text Document
Date 2017-12-06
LinkingTo BH, Rcpp
SystemRequirements C++11
License GPL-3
NeedsCompilation yes
Packaged 2017-12-06 09:38:32 UTC; hornik
Repository CRAN
Date/Publication 2017-12-06 18:26:44 UTC

