Learn R Programming

spiderbar (version 0.2.5)

Parse and Test Robots Exclusion Protocol Files and Rules

Description

The 'Robots Exclusion Protocol' documents a set of standards for allowing or excluding robot/spider crawling of different areas of site content. Tools are provided which wrap The 'rep-cpp' C++ library for processing these 'robots.txt' files.

Copy Link

Version

Install

install.packages('spiderbar')

Monthly Downloads

2,608

Version

0.2.5

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Bob Rudis

Last Published

February 11th, 2023

Functions in spiderbar (0.2.5)

spiderbar

Parse and Test Robots Exclusion Protocol Files and Rules
sitemaps

Retrieve a character vector of sitemaps from a parsed robots.txt object
can_fetch

Test URL paths against a robxp robots.txt object
crawl_delays

Retrieve all agent crawl delay values in a robxp robots.txt object
robxp

Parse a `robots.txt` file & create a `robxp` object
print.robxp

Custom printer for `robxp`` objects