base (version 3.2.4)

curlGetHeaders: Retrieve Headers from URLs

Description

Retrieve the headers for a URL for a supported protocol such as http://, ftp://, https:// and ftps://. An optional function not supported on all platforms.

Usage

curlGetHeaders(url, redirect = TRUE, verify = TRUE)

Arguments

url
character string specifying the URL.
redirect
logical: should redirections be followed?
verify
logical: should certificates be verified as valid and applying to that host?

Value

A character vector with integer attribute "status" (the last-received ‘status’ code). If redirection occurs this will include the headers for all the URLs visited.For the interpretation of ‘status’ codes see https://en.wikipedia.org/wiki/List_of_HTTP_status_codes and https://en.wikipedia.org/wiki/List_of_FTP_server_return_codes. A successful FTP connection will usually have status 250 or 350.

Details

This reports what curl -I -L or curl -I would report. For a ftp:// URL the ‘headers’ are a record of the conversation between client and server before data transfer.

Only 500 header lines will be reported: there is a limit of 20 redirections so this should suffice (and even 20 would indicate problems).

It uses getOption("timeout") for the connection timeout: that defaults to 60 seconds. As this cannot be interrupted you may want to consider a shorter value.

To see all the details of the interaction with the server(s) set options(internet.info = 1). HTTP[S] servers are allowed to refuse requests to read the headers and some do: this will result in a status of 405. For possible issues with secure URLs (especially on Windows) see download.file.

There is a security risk in not verifying certificates, but as only the headers are captured it is slight. Usually looking at the URL in a browser will reveal what the problem is (and it may well be machine-specific).

See Also

capabilities("libcurl") to see if this is supported.

options HTTPUserAgent and timeout are used.

Examples

Run this code

## Not run:  ## a not-always-available site:
# curlGetHeaders("ftps://test.rebex.net/readme.txt")
# ## End(Not run)

Run the code above in your browser using DataCamp Workspace