Learn R Programming

robotstxt (version 0.4.1)

robotstxt: Generate a representations of a robots.txt file

Description

The function generates a list that entails data resulting from parsing a robots.txt file as well as a function called check that enables to ask the representation if bot (or particular bots) are allowed to access a resource on the domain.

Usage

robotstxt(domain = NULL, text = NULL, user_agent = NULL)

Arguments

domain

Domain for which to generate a representation. If text equals to NULL, the function will download the file from server - the default.

text

If automatic download of the robots.txt is not preferred, the text can be supplied directly.

user_agent

HTTP user-agent string to be used to retrieve robots.txt file from domain

Value

Object (list) of class robotstxt with parsed data from a robots.txt (domain, text, bots, permissions, host, sitemap, other) and one function to (check()) to check resource permissions.

Fields

domain

character vector holding domain name for which the robots.txt file is valid; will be set to NA if not supplied on initialization

text

character vector of text of robots.txt file; either supplied on initialization or automatically downloaded from domain supplied on initialization

bots

character vector of bot names mentioned in robots.txt

permissions

data.frame of bot permissions found in robots.txt file

host

data.frame of host fields found in robots.txt file

sitemap

data.frame of sitemap fields found in robots.txt file

other

data.frame of other - none of the above - fields found in robots.txt file

check()

Method to check for bot permissions. Defaults to the domains root and no bot in particular. check() has two arguments: paths and bot. The first is for supplying the paths for which to check permissions and the latter to put in the name of the bot.

Examples

Run this code
# NOT RUN {
rt <- robotstxt(domain="google.com")
rt$bots
rt$permissions
rt$check( paths = c("/", "forbidden"), bot="*")
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab