xml2 (version 1.0.0)

read_xml: Read HTML or XML.

Description

Read HTML or XML.

Usage

read_xml(x, encoding = "", ..., as_html = FALSE, options = "NOBLANKS")
read_html(x, encoding = "", ..., options = c("RECOVER", "NOERROR", "NOBLANKS"))
"read_xml"(x, encoding = "", ..., as_html = FALSE, options = "NOBLANKS")
"read_xml"(x, encoding = "", base_url = "", ..., as_html = FALSE, options = "NOBLANKS")
"read_xml"(x, encoding = "", n = 64 * 1024, verbose = FALSE, ..., base_url = "", as_html = FALSE, options = "NOBLANKS")

Arguments

x
A string, a connection, or a raw vector.

A string can be either a path, a url or literal xml. Urls will be converted into connections either using base::url or, if installed, curl::curl. Local paths ending in .gz, .bz2, .xz, .zip will be automatically uncompressed.

If a connection, the complete connection is read into a raw vector before being parsed.

encoding
Specify a default encoding for the document. Unless otherwise specified XML documents are assumed to be in UTF-8 or UTF-16. If the document is not UTF-8/16, and lacks an explicit encoding directive, this allows you to supply a default.
...
Additional arguments passed on to methods.
as_html
Optionally parse an xml file as if it's html.
options
Set parsing options for the libxml2 parser. These are specified as a character vector of options to set. Available values are
base_url
When loading from a connection, raw vector or literal html/xml, this allows you to specify a base url for the document. Base urls are used to turn relative urls into absolute urls.
n
If file is a connection, the number of bytes to read per iteration. Defaults to 64kb.
verbose
When reading from a slow connection, this prints some output on every iteration so you know its working.

Value

An XML document. HTML is normalised to valid XML - this may not be exactly the same transformation performed by the browser, but it's a reasonable approximation.

Examples

Run this code
# Literal xml/html is useful for small examples
read_xml("<foo><bar /></foo>")
read_html("<html><title>Hi<title></html>")
read_html("<html><title>Hi")

# From a local path
read_html(system.file("extdata", "r-project.html", package = "xml2"))

# From a url
cd <- read_xml("http://www.xmlfiles.com/examples/cd_catalog.xml")
me <- read_html("http://had.co.nz")

Run the code above in your browser using DataLab