Download File from the Internet
This function can be used to download a file from the Internet.
download.file(url, destfile, method, quiet = FALSE, mode = "w", cacheOK = TRUE, extra = getOption("download.file.extra"))
- A character string naming the URL of a resource to be downloaded.
- A character string with the name where the downloaded file is saved. Tilde-expansion is performed.
- Method to be used for downloading files. Current
download methods are
"curl", and there is a value
"auto": see ‘Details’ and ‘Note’.
The method can also be set through the option
TRUE, suppress status messages (if any), and the progress bar.
- character. The mode with which to write the file. Useful
"ab". Only used for the
"internal"method. (See also ‘Details’.)
- logical. Is a server-side cached value acceptable?
- character vector of additional command-line arguments for
download.file can be used to download a single
file as described by
url from the internet and store it in
url must start with a scheme such as
http://, https://, ftp:// or file://. If
method = "auto" is chosen (the default), on a Unix-alike
"libcurl" is chosen for https:// and
ftps:// URLs and the
"internal" method is chosen for
method = "auto" is chosen (the default), on Windows the
"wininet" method is used apart from for ftps:// URLs
"libcurl" is tried. The
"wininet" method uses the
WinINet functions (part of the OS).
"libcurl" uses the library of that name
Support for method
"libcurl" is optional on Windows: use
capabilities("libcurl") to see if it is supported on
your build. It uses an external library of that name
(http://curl.haxx.se/libcurl/) against which R can be compiled.
If supported it will provide
(non-blocking) access to https:// and (usually) ftps://
URLs. There is support for simultaneous downloads, so
destfile can be character vectors of the same length greater
than one. For a single URL and
quiet = FALSE a progress
bar is shown in interactive use. For methods
"curl" a system call is made to
the tool given by
method, and the respective program must be
installed on your system and be in the search path for executables.
They will block all other activity on the R process until they
complete: this may make a GUI unresponsive.
cacheOK = FALSE is useful for http:// and
https:// URLs: it will attempt to get a copy directly from the
site rather than from an intermediate cache. It is used by
"wget" methods follow http://
and https:// redirections: the
"internal" method does not.
"curl" use argument
extra = "-L". To disable
extra = "--max-redirect=0".)
"wininet" method supports some redirections but not all.) Note that https:// URLs are not supported by the
"internal" method but are supported by the
method and the
"wininet" method on Windows. See
url for how file:// URLs are interpreted,
especially on Windows. The
methods do not percent-decode file:// URLs, but the
"curl" methods do: method
does not support them. Most methods do not percent-encode special characters such as spaces
in URLs (see
URLencode), but it seems the
"wininet" method does. The remaining details apply to the
"libcurl" methods only. The timeout for many parts of the transfer can be set by the option
timeout which defaults to 60 seconds. The level of detail provided during transfer can be set by the
quiet argument and the
internet.info option: the details
depend on the platform and scheme. For the
internet.info to 0 gives all available details,
including all server responses. Using 2 (the default) gives only
serious messages, and 3 or more suppresses all messages. For the
"libcurl" method values of the option less than 2 give verbose
output. A progress bar tracks the transfer. If the file length is known, the
full width of the bar is the known length. Otherwise the initial
width represents 100 Kbytes and is doubled whenever the current width
is exceeded. (In non-interactive use this uses a text version. If the
file length is known, an equals sign represents 2% of the transfer
completed: otherwise a dot represents 10Kb.) If
mode is not supplied and
url ends in one of
.RData a binary transfer is done. Since Windows
(unlike Unix-alikes) does distinguish between text and binary files,
care is needed that other binary file types are transferred with
mode = "wb".
A progress bar tracks the transfer. If the file length is known, an
equals sign represents 2% of the transfer completed: otherwise a dot
represents 10Kb. Code written to download binary files must use
mode = "wb", but
the problems incurred by a text transfer will only be seen on Windows.
An (invisible) integer code,
0 for success and non-zero for
failure. For the
"curl" methods this is the
status code returned by the external program. The
method can return
1, but will in most cases throw an error.
Files of more than 2GB are supported on 64-bit builds of R; they
may be truncated on some 32-bit builds. Methods
"curl" are mainly for historical
compatibility but provide may provide capabilities not supported by
"wininet" methods. Method
"wget" can be used with proxy firewalls which require
user/password authentication if proper values are stored in the
configuration file for
wget (http://www.gnu.org/software/wget/) is commonly
installed on Unix-alikes (but not macOS). Windows binaries are
available from Cygwin, gnuwin32 and elsewhere.
curl (http://curl.haxx.se/) is installed on macOS and
commonly on Unix-alikes. Windows binaries are available at that URL.
For the Windows-only method
"wininet", the ‘Internet
Options’ of the system are used to choose proxies and so on; these are
set in the Control Panel and are those used for Internet Explorer. The next two paragraphs apply to the internal code only. Proxies can be specified via environment variables.
* stops any proxy being tried.
Otherwise the setting of
(or failing that, the all upper-case version) is consulted and if
non-empty used as a proxy site. For FTP transfers, the username
and password on the proxy can be specified by
ftp_proxy_password. The form of
http://proxy.dom.com:8080/ where the port defaults to
80 and the trailing slash may be omitted. For
ftp_proxy use the form
where the default port is
21. These environment variables
must be set before the download code is first used: they cannot be
altered later by calling
Sys.setenv. Usernames and passwords can be set for HTTP proxy transfers via
http_proxy_user in the form
http_proxy can be of the
http://user:firstname.lastname@example.org:8080/ for compatibility
wget. Only the HTTP/1.0 basic authentication scheme is
Under Windows, if
http_proxy_user is set to
a dialog box will come up for the user to enter the username and
password. NB: you will be given only one opportunity to enter this,
but if proxy authentication is required and fails there will be one
further prompt per download. Much the same scheme is supported by
method = "libcurl", including
ftp_proxy, and for the last
two a contents of
[user:password@]machine[:port] where the
parts in brackets are optional. See
http://curl.haxx.se/libcurl/c/libcurl-tutorial.html for details.
Methods which access https:// and ftps:// URLs should
try to verify their certificates. This is usually done using the CA
root certificates installed by the OS (although we have seen instances
in which these got removed rather than updated). For further information
see http://curl.haxx.se/docs/sslcerts.html. This is an issue for
method = "libcurl" on Windows, where the
OS does not provide a suitable CA certificate bundle, so by default on
Windows certificates are not verified. To turn verification on, set
CURL_CA_BUNDLE to the path to a certificate
bundle file, usually named
curl-ca-bundle.crt. (This is normally done for a binary
installation of R, which installs
R_HOME/etc/curl-ca-bundle.crt and sets
CURL_CA_BUNDLE to point to it if that environment variable is not
already set.) For an updated certificate bundle, see
Currently one can download a copy from
CURL_CA_BUNDLE to the full path to the downloaded file. Note that the root certificates used by R may or may not be the same
as used in a browser, and indeed different browsers may use different
certificate bundles (there is typically a build option to choose
either their own or the system ones).
ftp: URLs are accessed using the FTP protocol which has a
number of variants. One distinction is between ‘active’ and
‘(extended) passive’ modes: which is used is chosen by the
"libcurl" method use passive
mode, and that is almost universally used by browsers. Prior to R
"wininet" method used active mode: since it first
tries passive and then active.
options to set the
internet.info options used by some of the methods.
url for a finer-grained way to read data from URLs.
download.packages for applications. Contributed package https://CRAN.R-project.org/package=RCurl provides more comprehensive
facilities to download from URLs.