Learn R Programming

⚠️There's a newer version (0.7.15) of this package.Take me there.

robotstxt (version 0.1.2)

A 'robots.txt' Parser and 'Webbot'/'Spider'/'Crawler' Permissions Checker

Description

Provides a 'Robotstxt' ('R6') class and accompanying methods to parse and check 'robots.txt' files. Data fields are provided as data frames and vectors. Permissions can be checked by providing path character vectors and optional bot names.

Copy Link

Version

Install

install.packages('robotstxt')

Monthly Downloads

2,706

Version

0.1.2

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Peter Meissner

Last Published

February 8th, 2016

Functions in robotstxt (0.1.2)

path_allowed

check if a bot has permissions to access page
rt_get_comments

extrcting comments from robots.txt
named_list

make automatically named list
rt_get_fields

extracting permissions from robots.txt
sanitize_permission_values

transforming permissions into regular expressions (values)
rt_get_useragent

extracting HTTP useragents from robots.txt
sanitize_permissions

transforming permissions into regular expressions (whole permission)
print.robotstxt_text

printing robotstxt_text
sanitize_path

making paths uniform
rt_get_rtxt

load robots.txt files saved along with the package
get_robotstxt

downloading robots.txt file
rt_list_rtxt

list robots.txt files saved along with the package
rt_get_fields_worker

extracting robotstxt fields
paths_allowed

check if a bot has permissions to access page
robotstxt

An object representation of robots.txt files
parse_robotstxt

function parsing robots.txt