tm (version 0.5-10)

ReutersSource: Reuters-21578 XML Source

Description

Construct a source for an input containing several Reuters-21578 XML documents.

Usage

ReutersSource(x, encoding = "unknown")

Arguments

x
Either a character identifying the file or a connection.
encoding
encoding to be assumed for input strings. It is used to mark character strings as known to be in Latin-1 or UTF-8: it is not used to re-encode the input.

Value

  • An object of class XMLSource which extends the class Source representing a Reuters-21578 XML document.

References

Lewis, David (1997) Reuters-21578 Text Categorization Collection Distribution 1.0. http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html

Luz, Saturnino XML-encoded version of Reuters-21578. http://modnlp.berlios.de/reuters21578.html

See Also

getSources to list available sources. Encoding on encodings in R.

Examples

Run this code
reuters21578 <- system.file("texts", "reuters-21578.xml", package = "tm")
rs <- ReutersSource(reuters21578)
inspect(Corpus(rs)[1:2])

Run the code above in your browser using DataCamp Workspace