
readXML(spec, doc, ...)
list
of list
s each containing two
character
vectors. The constructed reader will map each list
entry to a attribute or meta datum corresponding to the named list
entry. Valid names include C
TextDocument
function
with the signature elem, language, id
:list
with the named element content
which
must hold the document to be read in.character
vector giving the text's language.character
vector representing a unique identification
string for the returned text document.doc
augmented by the parsed information
out of the XML file as described by spec
. getReaders
to list available reader functions.
readReut21578XML <- readXML(
spec = list(Author = list("node", "/REUTERS/TEXT/AUTHOR"),
DateTimeStamp = list("function", function(node)
strptime(sapply(XML::getNodeSet(node, "/REUTERS/DATE"),
XML::xmlValue),
format = " tz = "GMT")),
Description = list("unevaluated", ""),
Heading = list("node", "/REUTERS/TEXT/TITLE"),
ID = list("attribute", "/REUTERS/@NEWID"),
Origin = list("unevaluated", "Reuters-21578 XML"),
Topics = list("node", "/REUTERS/TOPICS/D")),
doc = Reuters21578Document())
Run the code above in your browser using DataLab