xmlSource: Source the R code, examples, etc. from an XML document

Description

This is the equivalent of a smart source for extracting the R code elements from an XML document and evaluating them. This allows for a simple way to collect R functions definitions or a sequence of (annotated) R code segments in an XML document along with other material such as notes, documentation, data, FAQ entries, etc., and still be able to access the R code directly from within an R session. The approach enables one to use the XML document as a container for a heterogeneous collection of related material, some of which is R code. In the literate programming parlance, this function essentially dynamically "tangles" the document within R.

This particular version (as opposed to other implementations) uses XPath to conveniently find the nodes of interest.

This style of authoring code supports mixed language support in which we put, for example, C and R code together in the same document. Indeed, one can use the document to store arbitrary content and still retrieve the R code. The more structure there is, the easier it is to create tools to extract that information using XPath expressions.

We can identify individual r:code nodes in the document to process, i.e. evaluate. We do this using their id attribute and specifying which to process via the ids argument. Alternatively, if a document has a node r:codeIds as a child of the top-level node (or within an invisible node), we read its contents as a sequence of line separated id values as if they had been specified via the argument ids to this function.

We can also use XSL to extract the code. See getCode.xsl in the Omegahat XSL collection.

Usage

xmlSource(url, ...,
          envir =globalenv(),
          xpath = character(),
          ids = character(),
          omit = character(),
          ask = FALSE,
          example = NA,
          fatal = TRUE, verbose = FALSE,
          xnodes = c("//r:function", "//r:init[not(@eval='false')]", "//r:code[not(@eval='false')]", "//r:plot[not(@eval='false')]"),         
          namespaces = DefaultXPathNamespaces)

Arguments

url

the name of the file, URL containing the XML document, or an XML string. This is passed to xmlTreeParse which is called with useInternalNodes = TRUE.

...

additional arguments passed to xmlTreeParse

envir

the environment in which the code elements of the XML document are to be evaluated. By default, they are evaluated in the global environment so that assignments take place there.

xpath

a string giving an XPath expression which is used after parsing the document to filter the document to a particular subset of nodes. This allows one to restrict the evaluation to a subset of the original document. One can do this directly by

ids

a character vector. XML nodes containing R code (e.g. r:code, r:init, r:function, r:plot) can have an id attribute. This vector allows the caller to specify the subset of these nodes to

omit

a character vector. The values of the id attributes of the nodes that we want to skip or omit from the evaluation. This allows us to specify the set that we don't want evaluated, in contrast to the ids argument. The order is n

ask

logical

example

a character or numeric vector specifying the values of the id attributes of any r:example nodes in the document. A single document may contain numerous, separate examples and these can be marked uniquely using an id a

fatal

(currently unused) a logical value. The idea is to control how we handle errors when evaluating individual code segments. We could recover from errors and continue processing subsequent nodes.

verbose

a logical value. If TRUE, information about what code segments are being evaluated is displayed on the console.

xnodes

a character vector. This is a collection of xpath expressions given as individual strings which find the nodes whose contents we evaluate.

namespaces

a named character vector (i.e. name = value pairs of strings) giving the prefix - URI pairings for the namespaces used in the XPath expressions. The URIs must match those in the document, but the prefixes are local to the XPath expression.

Value

An R object (typically a list) that contains the results of evaluating the content of the different selected code segments in the XML document. We use sapply to iterate over the nodes and so If the results of all the nodes A list giving the pairs of expressions and evaluated objects for each of the different XML elements processed.

concept

Annotated code
Literate Programming
Mixed language

Details

This evaluates the code, function and example elements in the XML content that have the appropriate namespace (i.e. r, s, or no namespace) and discards all others. It also discards r:output nodes from the text, along with processing instructions and comments. And it resolves r:frag or r:code nodes with a ref attribute by identifying the corresponding r:code node with the same value for its id attribute and then evaluating that node in place of the r:frag reference.

Examples

Run this code

xmlSource(system.file("exampleData", "Rsource.xml", package="XML"))

  # This illustrates using r:frag nodes.
  # The r:frag nodes are not processed directly, but only
  # if referenced in the contents/body of a r:code node
 f = system.file("exampleData", "Rref.xml", package="XML")
 xmlSource(f)

Run the code above in your browser using DataLab