Learn R Programming

⚠️There's a newer version (0.8.2) of this package.Take me there.

htmltab (version 0.5.0)

Assemble Data Frames from HTML Tables

Description

htmltab is a package for extracting structured information from HTML tables. It is similar to readHTMLTable() of the XML package but provides two major advantages. First, the package automatically expands row and column spans in the header and body cells. Second, users are given more control over the identification of header and body rows which will end up in the R table. Additionally, the function preprocesses table code, removes unneeded parts and so helps to alleviate the need for tedious post-processing.

Copy Link

Version

Install

install.packages('htmltab')

Monthly Downloads

12

Version

0.5.0

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Christian Rubba

Last Published

January 14th, 2015

Functions in htmltab (0.5.0)

get_body_xpath

Return body xpath
htmltab

Robust methods for extracting structured information out of HTML tables
get_header_elements

Extracts header elements
num_xpath

Produce XPath for numeric index
rm_nuisance

Remove nuisance elements
check_type

Produce the table node
equal_length

Check if all list vectors have equal length
get_head_xpath

Return header xpath
is_empty

Check if list is empty
get_rowspans

Extracts rowspan information
get_head

Retrieve table head rows
add_tr

Assert that all table rows are nested in tr tags
get_xpath

Assemble XPath expressions for header and body
span_header

Creates header using span information
get_colspans

Extracts colspan information
rm_empty_cols

Remove no data columns
has_tag

Assert a specific tag in an XML node
span_body

Creates body using span information
clean_colnames

Produces correct column names
get_cell_element

Extracts cells elements
get_cells

Extract body cell values