rdwd (version 1.8.0)

indexFTP: Create a recursive index of an FTP Server

Description

Create a list of all the files (in all subfolders) of an FTP server. Defaults to the German Weather Service (DWD, Deutscher WetterDienst) OpenData server at https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/.
The R package RCurl must be available to do this.

It's not suggested to run this for all folders, as it can take quite some time and you may get kicked off the FTP-Server. This package contains an index of the climatic observations at weather stations (fileIndex) and gridded datasets (gridIndex). If they are out of date, please let me know!

Getting banned from the FTP Server
Normally, this shouldn't happen anymore: since Version 0.10.10 (2018-11-26), a single RCurl handle is used for all FTP requests and since version 1.0.17 (2019-05-14), the file tree provided by the DWD is used to obtain all folders first, eliminating the recursive calls.
There's a provision if the FTP server detects bot requests and denies access. If RCurl::getURL() fails, there will still be an output which you can pass in a second run via folder to extract the remaining dirs. You might need to wait a bit and set sleep to a higher value in that case. Here's an example:

gridindex <- indexFTP("", gridbase)
gridindex <- indexFTP(gridindex, gridbase, sleep=15)

Of course, with a higher sleep value, the execution will take longer!

Usage

indexFTP(
  folder = "currentfindex",
  base = dwdbase,
  is.file.if.has.dot = TRUE,
  exclude.latest.bin = TRUE,
  fast = TRUE,
  sleep = 0,
  dir = "DWDdata",
  filename = folder[1],
  overwrite = FALSE,
  quiet = rdwdquiet(),
  progbar = !quiet,
  verbose = FALSE
)

Value

a vector with file paths

Arguments

folder

Folder(s) to be indexed recursively, e.g. "/hourly/wind/". Leading slashes will be removed. Use folder="" to search at the location of base itself. If folder is "currentfindex" (the default) and base is the default, folder is changed to all observational folders listed in the current tree file at https://opendata.dwd.de/weather/tree.html. With "currentgindex" and gridbase, the grid folders in the tree are used. DEFAULT: "currentfindex"

base

Main directory of FTP server. Trailing slashes will be removed. DEFAULT: dwdbase

is.file.if.has.dot

Logical: if some of the input paths contain a dot, treat those as files, i.e. do not try to read those as if they were a folder. Only set this to FALSE if you know what you're doing. DEFAULT: TRUE

exclude.latest.bin

Exclude latest file at opendata.dwd.de/weather/radar/radolan? RCurl::getURL indicates this is a pointer to the last regularly named file. DEFAULT: TRUE

fast

Read tree file with data.table::fread() (1 sec) instead of readLines() (10 secs)? DEFAULT: TRUE

sleep

If not 0, a random number of seconds between 0 and sleep is passed to Sys.sleep() after each read folder to avoid getting kicked off the FTP-Server, see note above. DEFAULT: 0

dir

Writeable directory name where to save the downloaded file. Created if not existent. DEFAULT: "DWDdata" at current getwd()

filename

Character: Part of output filename. "INDEX_of_DWD_" is prepended, "/" replaced with "_", ".txt" appended. DEFAULT: folder[1]

overwrite

Logical: Overwrite existing file? If not, "_n" is added to the filename, see berryFunctions::newFilename(). DEFAULT: FALSE

quiet

Suppress progbars and message about directory/files? DEFAULT: FALSE through rdwdquiet()

progbar

Logical: present a progress bar in each level? DEFAULT: TRUE

verbose

Logical: write a lot of messages from RCurl::getURL()? DEFAULT: FALSE (usually, you dont need all the curl information)

Author

Berry Boessenkool, berry-b@gmx.de, Oct 2016

See Also

createIndex(), updateIndexes(), website index chapter

Examples

Run this code
if (FALSE)  ## Needs internet connection
sol <- indexFTP(folder="/daily/solar", dir=tempdir())
head(sol)

# mon <- indexFTP(folder="/monthly/kl", dir=tempdir(), verbose=TRUE)


Run the code above in your browser using DataCamp Workspace