get_robotstxts

domain from which to download robots.txt file

domain

warn about being unable to download domain/robots.txt because of

warn

if TRUE instead of using possible cached results the function
will re-download the robotstxt file HTTP response status 404. If this
happens,

force

HTTP user-agent string to be used to retrieve robots.txt
file from domain

user_agent

analog to CURL option
<a href="https://curl.haxx.se/libcurl/c/CURLOPT_SSL_VERIFYPEER.html">https://curl.haxx.se/libcurl/c/CURLOPT_SSL_VERIFYPEER.html</a>
-- and might help with robots.txt file retrieval in some cases #'

ssl_verifypeer

Should future::future_lapply be used for possible
parallel/async retrieval or not. Note: check out help
pages and vignettes of package future on how to set up
plans for future execution because the robotstxt package
does not do it on its own.

use_futures

function to get multiple robotstxt files

Provides functions to download and parse 'robots.txt' files.
Ultimately the package makes it easy to check if bots
(spiders, crawler, scrapers, ...) are allowed to access specific
resources on a domain.

Peter Meissner

robotstxt

A 'robots.txt' Parser and 'Webbot'/'Spider'/'Crawler'
Permissions Checker

Oliver Keys

Rich Fitz John

get_robotstxts function

analog to CURL option
<a href='https://curl.haxx.se/libcurl/c/CURLOPT_SSL_VERIFYPEER.html'>https://curl.haxx.se/libcurl/c/CURLOPT_SSL_VERIFYPEER.html</a>
-- and might help with robots.txt file retrieval in some cases #'

get_robotstxts: function to get multiple robotstxt files

Description

Usage

Arguments