Learn R Programming

Rcrawler (version 0.1.1)

ContentScraper: ContentScraper

Description

From a given web page as text _character_ and a set of named XPath patterns, this function extracts selected parts of the HTML document then it returns a list of extracted contents.

Usage

ContentScraper(webpage, patterns, patnames, excludepat, astext = TRUE, encod)

Arguments

webpage

character, a web page as text.

patterns

character vector, one or more XPath patterns to extract from the web page.

patnames

character vector, given names for each xpath pattern to extract.

excludepat

character vector, one o more Xpath to exclude from the extracted content.

astext

boolean, default is TRUE, HTML and PHP tags is stripped from the extracted piece.

encod

character, set the weppage character encoding.

Value

return a named list of extracted content

Examples

Run this code
# NOT RUN {
pageinfo<-LinkExtractor("http://glofile.com/index.php/2017/06/08/athletisme-m-a-rome/")
#Retreive the webpge header and data

Data<-ContentScraper(pageinfo[[1]][[10]],c("//head/title","//*/article"),c("title", "article"))
#Extract the title and the article from webpage content using Xpaths

# }

Run the code above in your browser using DataLab