Learn R Programming

⚠️There's a newer version (0.7-16) of this package.Take me there.

tm (version 0.5-1)

Text Mining Package

Description

A framework for text mining applications within R.

Copy Link

Version

Install

install.packages('tm')

Monthly Downloads

55,818

Version

0.5-1

License

GPL (>= 2)

Maintainer

Ingo Feinerer

Last Published

October 27th, 2009

Functions in tm (0.5-1)

Zipf_n_Heaps

Explore Corpus Term Frequency Characteristics
tm_combine

Combine Corpora and Documents
XMLSource

XML Source
Reuters21578Document

Reuters-21578 Text Document
writeCorpus

Write a Corpus to Disk
RCV1Document

RCV1 Text Document
VCorpus

Volatile Corpus
DataframeSource

Data Frame Source
readReut21578XML

Read In a Reuters-21578 XML Document
readDOC

Read In a MS Word Document
number

The Number of Rows/Columns/Dimensions/Documents/Terms of a Term-Document Matrix
findAssocs

Find Associations in a Term-Document Matrix
acq

50 Exemplary News Articles from the Reuters-21578 XML Data Set of Topic acq
Source

Access Sources
PlainTextDocument

Plain Text Document
tm_cluster

Allow `tm' to Use a Cluster
PCorpus

Permanent Corpus Constructor
readPlain

Read In a Text Document
getReaders

List Available Readers
FunctionGenerator

Function Generator
TextDocument

Access and Modify Text Documents
VectorSource

Vector Source
GmaneSource

Gmane Source
Dictionary

Dictionary
removePunctuation

Remove Punctuation Marks from a Text Document
weightTfIdf

Weight by Term Frequency - Inverse Document Frequency
dissimilarity

Dissimilarity
getTransformations

List Available Transformations
termFreq

Term Frequency Vector
removeSparseTerms

Remove Sparse Terms from a Term-Document Matrix
findFreqTerms

Find Frequent Terms
sFilter

Statement Filter
prescindMeta

Prescind Document Meta Data
tm_intersect

Intersection between Documents and Words
weightSMART

SMART Weightings
DirSource

Directory Source
readPDF

Read In a PDF Document
plot

Visualize a Term-Document Matrix
URISource

Uniform Resource Identifier Source
materialize

Materialize Lazy Mappings
getTokenizers

List Available Tokenizers
getFilters

List Available Filters
makeChunks

Split a Corpus into Chunks
crude

20 Exemplary News Articles from the Reuters-21578 XML Data Set of Topic crude
removeNumbers

Remove Numbers from a Text Document
convert_UTF_8

Convert Encoding to UTF-8
stemCompletion

Complete Stems
TextRepository

Text Repository
ReutersSource

Reuters-21578 XML Source
weightTf

Weight by Term Frequency
tm_map

Transformations on Corpora
stopwords

Stopwords
readGmane

Read In a Gmane RSS Feed
readXML

Read In an XML Document
stemDocument

Stem Words
tokenizer

Tokenizers
readRCV1

Read In a Reuters Corpus Volume 1 Document
getSources

List Available Sources
WeightFunction

Weighting Function
names

Row, Column, Dim Names, Document IDs, and Terms
inspect

Inspect Objects
TermDocumentMatrix

Term-Document Matrix
removeWords

Remove Words from a Text Document
tm_reduce

Combine Transformations
tm_filter

Filter and Index Functions on Corpora
readTabular

Read In a Text Document
stripWhitespace

Strip Whitespace from a Text Document
meta

Meta Data Management
foreign

Read Document-Term Matrices
searchFullText

Full Text Search
weightBin

Weight Binary
preprocessReut21578XML

Preprocess the Reuters-21578 XML archive.
tm_term_score

Compute Score for Matching Terms
as.PlainTextDocument

Create Objects of Class PlainTextDocument
tm_tag_score

Compute a Tag Score