tm (version 0.5-10)

readXML: Read In an XML Document

Description

Return a function which reads in an XML document. The structure of the XML document can be described with a specification.

Usage

readXML(spec, doc)

Arguments

spec
A named list of lists each containing two components. The constructed reader will map each list entry to an attribute or meta datum corresponding to the named list entry. Valid names include Content to access the document's co
doc
An (empty) document of some subclass of TextDocument

Value

  • A function with the signature elem, language, id: [object Object],[object Object],[object Object] The function returns doc augmented by the parsed information as described by spec out of the XML file in elem$content.

Details

Formally this function is a function generator, i.e., it returns a function (which reads in a text document) with a well-defined signature, but can access passed over arguments (e.g., the specification) via lexical scoping.

See Also

Vignette 'Extensions: How to Handle Custom File Formats', XMLSource.

getReaders to list available reader functions.

Examples

Run this code
readGmane <-
readXML(spec = list(Author = list("node", "/item/creator"),
                    Content = list("node", "/item/description"),
                    DateTimeStamp = list("function", function(node)
                    strptime(sapply(XML::getNodeSet(node, "/item/date"), XML::xmlValue),
                             format = "%Y-%m-%dT%H:%M:%S",
                             tz = "GMT")),
                    Description = list("unevaluated", ""),
                    Heading = list("node", "/item/title"),
                    ID = list("node", "/item/link"),
                    Origin = list("unevaluated", "Gmane Mailing List Archive")),
                    doc = PlainTextDocument())

Run the code above in your browser using DataLab