XML (version 3.98-1.19)

xmlFlatListTree: Constructors for trees stored as flat list of nodes with information about parents and children.

Description

These (and related internal) functions allow us to represent trees as a simple, non-hierarchical collection of nodes along with corresponding tables that identify the parent and child relationships. This is different from representing a tree as a list of lists of lists ... in which each node has a list of its own children. In a functional language like R, it is not possible then for the children to be able to identify their parents.

We use an environment to represent these flat trees. Since these are mutable without requiring the change to be reassigned, we can modify a part of the tree locally without having to reassign the top-level object.

We can use either a list (with names) to store the nodes or a hash table/associative array that uses names. There is a non-trivial performance difference.

Usage

xmlHashTree(nodes = list(), parents = character(), children = list(), 
             env = new.env(TRUE, parent = emptyenv()))
xmlFlatListTree(nodes = list(), parents = character(), children = list(),
                env = new.env(), n = 200)

Arguments

nodes

a collection of existing nodes that are to be added to the tree. These are used to initialize the tree. If this is specified, you must also specify children and parents.

parents

the parent relationships for the nodes given by nodes.

children

the children relationships for the nodes given by nodes.

env

an environment in which the information for the tree will be stored. This is essentially the tree object as it allows us to modify parts of the tree without having to reassign the top-level object. Unlike most R data types, environments are mutable.

n

for xmlFlatListTree, this is used as the default size to allocate for the list containing the nodes

Value

An object of class XMLFlatTree which is specialized to XMLFlatListTree by the xmlFlatListTree function and XMLHashTree by the xmlHashTree function. Both objects are simply the environment which contains information about the tree elements and functions to access this information.

An xmlHashTree object has an accessor method via $ for accessing individual nodes within the tree. One can use the node name/identifier in an expression such as tt$myNode to obtain the element. The name of a node is either its XML node name or if that is already present in the tree, a machine generated name.

One can find the names of all the nodes using the objects function since these trees are regular environments in R. Using the all = TRUE argument, one can also find the “hidden” elements that make define the tree's structure. These are .children and .parents. The former is an (hashed) environment. Each element is identified by the node in the tree by the node's identifier (corresponding to the name of the node in the tree's environment). The value of that element is simply a character vector giving the identifiers of all of the children of that node.

The .parents element is also an environemnt. Each element in this gives the pair of node and parent identifiers with the parent identifier being the value of the variable in the environment. In other words, we look up the parent of a node named 'kid' by retrieving the value of the variable 'kid' in the .parents environment of this hash tree.

The function .addNode is used to insert a new node into the tree.

The structure of this tree allows one to easily travers all nodes, navigate up the tree from a node via its parent. Certain tasks are more complex as the hierarchy is not implicit within a node.

References

http://www.w3.org/XML

See Also

xmlTreeParse xmlTree xmlOutputBuffer xmlOutputDOM

Examples

Run this code
# NOT RUN {
 f = system.file("exampleData", "dataframe.xml", package = "XML")
 tr  = xmlHashTree()
 xmlTreeParse(f, handlers = list(.startElement = tr[[".addNode"]]))

 tr # print the tree on the screen

  # Get the two child nodes of the dataframe node.
 xmlChildren(tr$dataframe)

  # Find the names of all the nodes.
 objects(tr)
  # Which nodes have children
 objects(tr$.children)

  # Which nodes are leaves, i.e. do not have children
 setdiff(objects(tr), objects(tr$.children))

  # find the class of each of these leaf nodes.
 sapply(setdiff(objects(tr), objects(tr$.children)),
         function(id) class(tr[[id]]))

  # distribution of number of children
 sapply(tr$.children, length)


  # Get the first A node
 tr$A

  # Get is parent node.
 xmlParent(tr$A)


 f = system.file("exampleData", "allNodeTypes.xml", package = "XML")

   # Convert the document
 r = xmlInternalTreeParse(f, xinclude = TRUE)
 ht = as(r, "XMLHashTree")
 ht
 
  # work on the root node, or any node actually
 as(xmlRoot(r), "XMLHashTree")

  # Example of making copies of an XMLHashTreeNode object to create a separate tree.
 f = system.file("exampleData", "simple.xml", package = "XML")
 tt = as(xmlParse(f), "XMLHashTree")

 xmlRoot(tt)[[1]]
 xmlRoot(tt)[[1, copy = TRUE]]

 table(unlist(eapply(tt, xmlName)))
 # if any of the nodes had any attributes
 # table(unlist(eapply(tt, xmlAttrs)))
# }

Run the code above in your browser using DataCamp Workspace