base (version 3.5.3)

curlGetHeaders: Retrieve Headers from URLs

Description

Retrieve the headers for a URL for a supported protocol such as http://, ftp://, https:// and ftps://. An optional function not supported on all platforms.

Usage

curlGetHeaders(url, redirect = TRUE, verify = TRUE)

Arguments

url

character string specifying the URL.

redirect

logical: should redirections be followed?

verify

logical: should certificates be verified as valid and applying to that host?

Value

A character vector with integer attribute "status" (the last-received ‘status’ code). If redirection occurs this will include the headers for all the URLs visited.

For the interpretation of ‘status’ codes see https://en.wikipedia.org/wiki/List_of_HTTP_status_codes and https://en.wikipedia.org/wiki/List_of_FTP_server_return_codes. A successful FTP connection will usually have status 250 or 350.

Details

This reports what curl -I -L or curl -I would report. For a ftp:// URL the ‘headers’ are a record of the conversation between client and server before data transfer.

Only 500 header lines will be reported: there is a limit of 20 redirections so this should suffice (and even 20 would indicate problems).

It uses getOption("timeout") for the connection timeout: that defaults to 60 seconds. As this cannot be interrupted you may want to consider a shorter value.

To see all the details of the interaction with the server(s) set options(internet.info = 1).

HTTP[S] servers are allowed to refuse requests to read the headers and some do: this will result in a status of 405.

For possible issues with secure URLs (especially on Windows) see download.file.

There is a security risk in not verifying certificates, but as only the headers are captured it is slight. Usually looking at the URL in a browser will reveal what the problem is (and it may well be machine-specific).

See Also

capabilities("libcurl") to see if this is supported.

options HTTPUserAgent and timeout are used.

Examples

Run this code
# NOT RUN {
## needs Internet access, results vary
curlGetHeaders("http://bugs.r-project.org")   ## this redirects to https://
curlGetHeaders("https://httpbin.org/status/404")  ## returns status
curlGetHeaders("ftp://cran.r-project.org")
# }
# NOT RUN {
 ## a not-always-available site:
curlGetHeaders("ftps://test.rebex.net/readme.txt")
# }

Run the code above in your browser using DataCamp Workspace