ContentScraper

webpage

character vector, one or more XPath patterns to extract from the web page.

patterns

character vector, given names for each xpath pattern to extract.

patnames

character vector, one o more Xpath to exclude from the extracted content.

excludepat

boolean, default is TRUE, HTML and PHP tags is stripped from the extracted piece.

astext

character, set the weppage character encoding.

encod

From a given web page as text _character_ and a set of named XPath patterns, this function extracts selected parts of the HTML document then it returns a list of extracted contents.

Performs parallel web crawling and web scraping. It is designed to crawl, parse and store web pages to produce data that can be directly used for analysis application. For details see Khalil and Fakir (2017) <DOI:10.1016/j.softx.2017.04.004>.

ContentScraper: ContentScraper

Description

Usage

Arguments

Value

Examples