robotstxt: Generate a representations of a robots.txt file
Description
The function generates a list that entails data resulting from parsing a robots.txt file
as well as a function called check that enables to ask the representation if bot (or
particular bots) are allowed to access a resource on the domain.
Usage
robotstxt(domain = NULL, text = NULL, user_agent = NULL)
Arguments
domain
Domain for which to generate a representation. If text equals to NULL,
the function will download the file from server - the default.
text
If automatic download of the robots.txt is not preferred, the text can be
supplied directly.
user_agent
HTTP user-agent string to be used to retrieve robots.txt file
from domain
Value
Object (list) of class robotstxt with parsed data from a
robots.txt (domain, text, bots, permissions, host, sitemap, other) and one
function to (check()) to check resource permissions.
Fields
domain
character vector holding domain name for which the robots.txt
file is valid; will be set to NA if not supplied on initialization
text
character vector of text of robots.txt file; either supplied on
initialization or automatically downloaded from domain supplied on
initialization
bots
character vector of bot names mentioned in robots.txt
permissions
data.frame of bot permissions found in robots.txt file
host
data.frame of host fields found in robots.txt file
sitemap
data.frame of sitemap fields found in robots.txt file
other
data.frame of other - none of the above - fields found in
robots.txt file
check()
Method to check for bot permissions. Defaults to the
domains root and no bot in particular. check() has two arguments:
paths and bot. The first is for supplying the paths for which to check
permissions and the latter to put in the name of the bot.