tm v0.5-10


by Ingo Feinerer

Text Mining Package

A framework for text mining applications within R.

Functions in tm

findAssocs Find Associations in a Term-Document Matrix
XMLSource XML Source
weightTf Weight by Term Frequency
dissimilarity Dissimilarity
PlainTextDocument Plain Text Document
TextDocument Access and Modify Text Documents
stemDocument Stem Words
FunctionGenerator Function Generator
plot Visualize a Term-Document Matrix
removeSparseTerms Remove Sparse Terms from a Term-Document Matrix
getReaders List Available Readers
tm_filter Filter and Index Functions on Corpora
Zipf_n_Heaps Explore Corpus Term Frequency Characteristics
acq 50 Exemplary News Articles from the Reuters-21578 XML Data Set of Topic acq
removeWords Remove Words from a Text Document
WeightFunction Weighting Function
tm_term_score Compute Score for Matching Terms
URISource Uniform Resource Identifier Source
tm_map Transformations on Corpora
readPDF Read In a PDF Document
DirSource Directory Source
findFreqTerms Find Frequent Terms
VectorSource Vector Source
readReut21578XML Read In a Reuters-21578 XML Document
weightTfIdf Weight by Term Frequency - Inverse Document Frequency
readRCV1 Read In a Reuters Corpus Volume 1 Document
readXML Read In an XML Document
removeNumbers Remove Numbers from a Text Document
weightBin Weight Binary
getSources List Available Sources
readDOC Read In a MS Word Document
inspect Inspect Objects
readPlain Read In a Text Document
VCorpus Volatile Corpus
prescindMeta Prescind Document Meta Data
as.PlainTextDocument Create Objects of Class PlainTextDocument
removePunctuation Remove Punctuation Marks from a Text Document
number The Number of Rows/Columns/Dimensions/Documents/Terms of a Term-Document Matrix
tokenizer Tokenizers
writeCorpus Write a Corpus to Disk
weightSMART SMART Weightings
termFreq Term Frequency Vector
readTabular Read In a Text Document
makeChunks Split a Corpus into Chunks
foreign Read Document-Term Matrices
DataframeSource Data Frame Source
Source Create and Access Sources
names Row, Column, Dim Names, Document IDs, and Terms
tm_combine Combine Corpora, Documents, Term-Document Matrices, and Term Frequency Vectors
PCorpus Permanent Corpus Constructor
getTokenizers List Available Tokenizers
stripWhitespace Strip Whitespace from a Text Document
getTransformations List Available Transformations
TermDocumentMatrix Term-Document Matrix
meta Meta Data Management
sFilter Statement Filter
stopwords Stopwords
crude 20 Exemplary News Articles from the Reuters-21578 XML Data Set of Topic crude
Reuters21578Document Reuters-21578 Text Document
ReutersSource Reuters-21578 XML Source
tm_reduce Combine Transformations
RCV1Document RCV1 Text Document
TextRepository Text Repository
materialize Materialize Lazy Mappings
stemCompletion Complete Stems
Date 2014-01-07
SystemRequirements Antiword ( for reading MS Word files, pdfinfo and pdftotext from Poppler ( for reading PDF
License GPL-3
Packaged 2014-01-13 16:41:07 UTC; hornik
NeedsCompilation yes
Repository CRAN
Date/Publication 2014-01-13 18:40:58

