selectDWD: Select data from the DWD CDC FTP Server

Description

Select files for downloading with dataDWD(). The available folders with datasets are listed at https://bookdown.org/brry/rdwd/available-datasets.html. To use an updated index (if necessary), see https://bookdown.org/brry/rdwd/fileindex.html. All arguments (except for mindex, findex and base) can be a vector and will be recycled to the maximum length of all arguments. If that length > 1, the output is a list of filenames (or vector if outvec=TRUE). If station name is given, but id is empty (""), id is inferred via findID() using mindex. If res/var/per are given and valid (existing in findex), they are pasted together to form a path. Here is an overview of the behavior in each case of availability:

case	\| id	\| path	\| output
1	\| ""	\| ""	\| `base` (and some warnings)
2	\| "xx"	\| ""	\| All file names (across paths) for station id
3	\| ""	\| "xx"	\| The zip file names at path
4	\| "xx"	\| "xx"	\| Regular single data file name

For case 2, you can explicitly set res="",var="",per="" to avoid the default interactive selection. For case 3 and 4 (path given), you can set meta=TRUE. Then selectDWD will return the name of the station description txt file at path. This is why case 3 with the default meta=FALSE only returns the data file names (ending in .zip) and not the description and Beschreibung txt/pdf files. Open those in a browser with

pdfpath <- grep("daily/kl/h.*DESCRIPTION", fileIndex$path, value=TRUE)
browseURL(paste0(dwdbase, "/", pdfpath))

Let me know if besides meta, pdf is needed for automated opening.

Usage

selectDWD(
  name = "",
  res = NA,
  var = NA,
  per = NA,
  base = dwdbase,
  outvec = any(per %in% c("rh", "hr")),
  findex = fileIndex,
  remove_dupli = TRUE,
  current = FALSE,
  id = findID(name, exactmatch = exactmatch, mindex = mindex, quiet = quiet),
  mindex = metaIndex,
  exactmatch = TRUE,
  meta = FALSE,
  meta_txt_only = TRUE,
  quiet = rdwdquiet(),
  ...
)

Arguments

name

Char: station name(s) passed to findID(), along with exactmatch and mindex. All 3 arguments are ignored if id is given. DEFAULT: ""

res

Char: temporal resolution available at base, usually one of c("hourly","daily","monthly"), see section 'Description' above. res/var/per together form the path. DEFAULT: NA for interactive selection

var

Char: weather variable of interest, like e.g. "air_temperature", "cloudiness", "precipitation", "soil_temperature", "solar", "kl", "more_precip" See above and in fileIndex. DEFAULT: NA for interactive selection

per

Char: desired time period. One of "recent" (data from the last year, up to date usually within a few days) or "historical" (long time series). Can be abbreviated (if the first letter is "r" or "h", full names are used). To get both datasets, use per="hr" or per="rh" (and outvec=TRUE). per is set to "" if var=="solar". DEFAULT: NA for interactive selection

base

Single char: main directory of DWD ftp server. Must be the same base used to create findex. DEFAULT: dwdbase

outvec

Single logical: if path or ID length > 1, instead of a list, return a vector? (via unlist()). DEFAULT: per %in% c("rh","hr")

findex

Single object: Index used to select filename, as returned by createIndex().To use a current / custom index, see https://bookdown.org/brry/rdwd/fileindex.html. DEFAULT: fileIndex

remove_dupli

Logical: Remove duplicate entries in the fileIndex? If duplicates are found, a warning will be issued, unless quiet=TRUE. The DWD updates files on the server quite often and sometimes misses removing the old files, leading to duplicates, usually with differences only in the date range. A semi-current (manually updated) list of duplicates is on github. Before reporting, run updateRdwd() to see if fileIndex has been updated. I email the DWD about duplicates when I find them, they usually fix it soon. If remove_dupli=TRUE, only the file with the longer timespan will be kept. This is selected according to filename, which is not very reliable, hence manual checking is recommended. DEFAULT: TRUE

current

Single logical for case 3/4 with given path: instead of findex, use a list of the currently available files at base/res/var/per? This will call indexFTP(), thus requires availability of the RCurl package. DEFAULT: FALSE

Char/Number: station ID with or without leading zeros, e.g. "00614" or 614. Is internally converted to an integer, because some DWD meta data files also contain no leading zeros. DEFAULT: findID(name, exaxtmatch, mindex)

mindex

Single object: Index with metadata passed to findID(). DEFAULT: metaIndex

exactmatch

Logical passed to findID(): match name with ==)? Else with grepl(). DEFAULT: TRUE

Value

Character string with file path and name(s) in the format "base/res/var/per/filename.zip"

Examples

Run this code

# NOT RUN {
# Give weather station name (must be existing in metaIndex):
selectDWD("Potsdam", res="daily", var="kl", per="historical")

# all files for all stations matching "Koeln":
selectDWD("Koeln", res="", var="", per="", exactmatch=FALSE)
findID("Koeln", FALSE)

# }
# NOT RUN {
 # Excluded from CRAN checks to save time

# selectDWD("Potsdam") # interactive selection of res/var/per

# directly give station ID, can also be id="00386" :
selectDWD(id=386, res="daily", var="kl", per="historical")

# period can be abbreviated:
selectDWD(id="00386", res="daily", var="kl", per="h")
selectDWD(id="00386", res="daily", var="kl", per="h", meta=TRUE)

# vectorizable:
selectDWD(id="01050", res="daily", var="kl", per="rh") # list if outvec=F
selectDWD(id="01050", res=c("daily","monthly"), var="kl", per="r")
# vectorization gives not the outer product, but elementwise comparison:
selectDWD(id="01050", res=c("daily","monthly"), var="kl", per="hr")

# all zip files in all paths matching id:
selectDWD(id=c(1050, 386), res="",var="",per="")
# all zip files in a given path (if ID is empty):
head(  selectDWD(id="", res="daily", var="kl", per="recent")   )

# }
# NOT RUN {
# }

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples