Learn R Programming

MazamaCoreUtils (version 0.4.4)

html_getTables: Find all tables in an html page

Description

Parses an html page to extract all <table> elements and return them in a list of dataframes representing each table. The columns and rows of these dataframes are that of the table it represents. A single table can be extracted as a dataframe by passing the index of the table in addition to the url to html_getTable().

Usage

html_getTables(url = NULL)

html_getTable(url = NULL, index = 1)

Arguments

url

URL or file path of an html page.

index

Index identifying which table to to return.

Value

A list of dataframes representing each table on a html page.

Examples

Run this code
# NOT RUN {
library(MazamaCoreUtils)

# Wikipedia's list of timezones
url <- "http://en.wikipedia.org/wiki/List_of_tz_database_time_zones"

# Extract tables
tables <- html_getTables(url)

# Extract the first table
# NOTE: Analogous to firstTable <- html_getTable(url, index = 1)
firstTable <- tables[[1]]

head(firstTable)
nrow(firstTable)

# }

Run the code above in your browser using DataLab