Learn R Programming

⚠️There's a newer version (1.3.6) of this package.Take me there.

xml2

The xml2 package is a binding to libxml2, making it easy to work with HTML and XML from R. The API is somewhat inspired by jQuery.

Installation

You can install xml2 from CRAN,

install.packages("xml2")

or you can install the development version from github, using devtools:

# install.packages("devtools")
devtools::install_github("r-lib/xml2")

Usage

library("xml2")
x <- read_xml("<foo> <bar> text <baz/> </bar> </foo>")
x

xml_name(x)
xml_children(x)
xml_text(x)
xml_find_all(x, ".//baz")

h <- read_html("<html><p>Hi <b>!")
h
xml_name(h)
xml_text(h)

There are three key classes:

  • xml_node: a single node in a document.

  • xml_doc: the complete document. Acting on a document is usually the same as acting on the root node of the document.

  • xml_nodeset: a set of nodes within the document. Operations on xml_nodesets are vectorised, apply the operation over each node in the set.

Compared to the XML package

xml2 has similar goals to the XML package. The main differences are:

  • xml2 takes care of memory management for you. It will automatically free the memory used by an XML document as soon as the last reference to it goes away.

  • xml2 has a very simple class hierarchy so you don't need to think about exactly what type of object you have, xml2 will just do the right thing.

  • More convenient handling of namespaces in Xpath expressions - see xml_ns() and xml_ns_strip() to get started.

Code of Conduct

Please note that the xml2 project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Copy Link

Version

Install

install.packages('xml2')

Monthly Downloads

764,631

Version

1.3.5

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Last Published

July 6th, 2023

Functions in xml2 (1.3.5)

xml_children

Navigate around the family tree.
xml_comment

Construct a comment node
xml_document-class

Register S4 classes
xml_url

The URL of an XML document
xml_validate

Validate XML schema
xml_attr

Retrieve an attribute.
xml_set_namespace

Set the node's namespace
xml_structure

Show the structure of an html/xml document.
xml_cdata

Construct a cdata node
xml_text

Extract or modify the text
xml_name

The (tag) name of an xml element.
xml_missing

Construct an missing xml object
xml_new_document

Create a new document, possibly with a root node
xml_ns

XML namespaces.
xml_replace

Modify a tree by inserting, replacing or removing nodes
xml_type

Determine the type of a node.
xml_dtd

Construct a document type definition
xml_path

Retrieve the xpath to a node
xml_find_all

Find nodes that match an xpath expression.
xml_ns_strip

Strip the default namespaces from a document
xml_serialize

Serializing XML objects to connections.
download_xml

Download a HTML or XML file
xml2_example

Get path to a xml2 example
url_escape

Escape and unescape urls.
url_parse

Parse a url into its component pieces.
write_xml

Write XML or HTML to disk.
as_list

Coerce xml nodes to a list.
url_absolute

Convert between relative and absolute urls.
as_xml_document

Coerce a R list to xml nodes.
read_xml

Read HTML or XML.