Rcrawler (version 0.1.9-1)

Web Crawler and Scraper

Description

Performs parallel web crawling and web scraping. It is designed to crawl, parse and store web pages to produce data that can be directly used for analysis application. For details see Khalil and Fakir (2017) .

Copy Link

Version

Install

install.packages('Rcrawler')

Monthly Downloads

105

Version

0.1.9-1

License

GPL (>= 2)

Issues

Pull Requests

Stars

356

Forks

Repository

https://github.com/salimk/Rcrawler/

Maintainer

Salim Khalil

Last Published

November 11th, 2018

Functions in Rcrawler (0.1.9-1)

Getencoding

LinkExtractor

LoginSession

Open a logged in Session

install_browser

Install PhantomJS webdriver

RobotParser

RobotParser fetch and parse robots.txt

run_browser

Start up web driver process on localhost, with a random port

Rcrawler

stop_browser

Stop web driver process and Remove its Object

browser_path

Return browser (webdriver) location path

Get the list of parameters and values from an URL

Linkparamsfilter

Link parameters filter

LoadHTMLFiles

LoadHTMLFiles @rdname LoadHTMLFiles

ListProjects

Drv_fetchpage

Fetch page using web driver/Session