rdom (version 0.0.3.9001)

rdom: Return DOM of a website as HTML

Description

Return DOM of a website as HTML

Usage

rdom(url, css, all, timeout, filename)

Arguments

url

A url or a local path.

css

a string containing one or more CSS selectors separated by commas.

all

logical. This controls whether querySelector or querySelectorAll is used to extract elements from the page. When FALSE, output is similar to rvest::html_node. When TRUE, output is similar to rvest::html_nodes. If css is missing, then this argument is ignored.

timeout

maximum time to wait for page to load and render, in seconds.

filename

A character string specifying a filename to store result

Examples

Run this code
# NOT RUN {
filename = tempfile(tmpdir = "~", fileext = ".html") # added space
rdom("https://en.wikipedia.org/wiki/Main_Page", filename = filename)
rdom(filename)
unlink(filename)

library("rvest")
stars <- "http://www.techstars.com/companies/stats/"
# doesn't work
html(stars) %>% html_node(".table75") %>% html_table()
# should work
rdom(stars) %>%
html_node(".table75") %>%
html_table()
# more efficient
stars %>% rdom(".table75") %>% html_table()
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab