Learn R Programming

rvest (version 0.2.0)

html_text: Extract attributes, text and tag name from html.

Description

Extract attributes, text and tag name from html.

Usage

html_text(x, ...)

html_tag(x)

html_children(x)

html_attrs(x)

html_attr(x, name, default = NA_character_)

Arguments

x
Either a complete document (XMLInternalDocument), a list of tags (XMLNodeSet) or a single tag (XMLInternalElementNode).
...
Other arguments passed onto xmlValue(). The most useful argument is trim = TRUE which will remove leading and trailing whitespace.
name
Name of attribute to extract.
default
A string used as a default value when the attribute does not exist in every node.

Value

  • html_attr, html_tag and html_text, a character vector; html_attrs, a list.

Examples

Run this code
movie <- html("http://www.imdb.com/title/tt1490017/")
cast <- html_nodes(movie, "#titleCast span.itemprop")
html_text(cast)
html_tag(cast)
html_attrs(cast)
html_attr(cast, "class")
html_attr(cast, "itemprop")

basic <- html("<p class='a'><b>Bold text</b></p>")
p <- html_node(basic, "p")
p
# Can subset with numbers to extract children
p[[1]]
# Use html_attr to get attributes
html_attr(p, "class")

Run the code above in your browser using DataLab