Learn R Programming

tm (version 0.7-8)

readReut21578XML: Read In a Reuters-21578 XML Document

Description

Read in a Reuters-21578 XML document.

Usage

readReut21578XML(elem, language, id)
readReut21578XMLasPlain(elem, language, id)

Value

An XMLTextDocument for readReut21578XML, or a

PlainTextDocument for readReut21578XMLasPlain, representing the text and metadata extracted from elem$content.

Arguments

elem

a named list with the component content which must hold the document to be read in.

language

a string giving the language.

id

Not used.

References

Emms, Martin and Luz, Saturnino (2007). Machine Learning for Natural Language Processing. European Summer School of Logic, Language and Information, course reader. http://www.homepages.ed.ac.uk/sluzfil/esslli07/mlfornlp.pdf

Lewis, David (1997) Reuters-21578 Text Categorization Collection Distribution 1.0. http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html

Luz, Saturnino XML-encoded version of Reuters-21578. http://www.homepages.ed.ac.uk/sluzfil/esslli07/data/reuters21578-xml.tar.bz2

See Also

Reader for basic information on the reader infrastructure employed by package tm.