Download data sets from Eurostat database and put in a standardized format.
get_eurostat_bulk(id, cache = TRUE, update_cache = FALSE,
cache_dir = NULL, compress_file = TRUE,
stringsAsFactors = default.stringsAsFactors(), select_freq = NULL,
keep_flags = FALSE, cflags = FALSE, check_toc = FALSE,
verbose = FALSE, ...)
A code name for the dataset of interest.
See search_eurostat_toc
for details how to get an id.
a logical whether to do caching. Default is TRUE
.
a logical with a default value FALSE
, whether to update cache. Can be set also with
options(restatapi_update=TRUE)
a path to a cache directory. The NULL
(default) uses the memory as cache.
If the folder if the cache_dir
directory does not exist it saves in the 'restatapi' directory
under the temporary directory from tempdir()
. Directory can also be set with
option(restatapi_cache_dir=...)
.
a logical whether to compress the
RDS-file in caching. Default is TRUE
.
if TRUE
(the default) variables are not numeric then they are
converted to factors. If the value FALSE
they are returned as a characters.
a character symbol for a time frequency when a dataset has multiple time
frequencies. Possible values are:
A = annual, S = semi-annual, H = half-year, Q = quarterly, M = monthly, W = weekly, D = daily.
The default is NULL
as most datasets have just one time frequency.
In case if there are multiple frequencies and select_freq=NULL
, then only the most common frequency kept.
If all the frequencies needed the get_eurostat_raw
can be used.
a logical whether the observation status (flags) - e.g. "confidential",
"provisional", etc. - should be kept in a separate column or if they
can be removed. Default is FALSE
. For flag values see:
http://ec.europa.eu/eurostat/data/database/information.
a logical whether the missing observations with flag 'c' - "confidential"
should be kept or not. Default is FALSE
, in this case these observations dropped from the dataset. If this parameter
TRUE
then the flags are kept and the parameter provided in keep_flags
is not taken into account.
a boolean whether to check the provided id
in the Table of Contents (TOC) or not. The default value
FALSE
, in this case the base URL for the download link is retrieved from the configuration file.
If the value is TRUE
then the TOC is downloaded and the id
is checked in it. If it found then the download link
is retrieved form the TOC.
A boolean with default FALSE
, so detailed messages (for debugging) will not printed.
Can be set also with options(restatapi_verbose=TRUE)
other parameter(s) to pass on the load_cfg
function
a data.table with the following columns: #'
dimension names | One column for each dimension in the data |
time |
A column for the time dimension |
values |
A column for numerical values |
The data.table does not include all missing values. The missing values are dropped if the value and flag are missing on a particular time.
Data sets are downloaded from the Eurostat bulk download facility in TSV format as in this case smaller file has to be downloaded and processed. If there is more then one frequency then the is filtered for a unique time frequency. If no frequency is selected and there are multiple frequencies in the dataset, then the most common value is used used for frequency.
Compared to get_eurostat_raw
the frequency (FREQ) and time format (TIME_FORMAT) columns are not included
and the column names for the time period, observation values and status have standardised names: "time", "values" and "flags"
independently if the data was downloaded previously through SDMX or TSV format.
By default all datasets cached as they are often rather large.
The datasets cached in memory (default) or can be stored in a temporary directory if cache_dir
or option(restatpi_cache_dir)
is defined.
The cache can be emptied with clean_restatapi_cache
.
The id
, is a value from the code
column of the table of contents (get_eurostat_toc
), and can be searched for with the search_eurostat_toc
function. The id value can be retrieved from the Eurostat database
as well. The Eurostat database gives codes in the Data Navigation Tree after every dataset
in parenthesis.
# NOT RUN {
dt<-get_eurostat_bulk("agr_r_milkpr",keep_flags=TRUE)
options(restatapi_update=TRUE)
dt<-get_eurostat_bulk("avia_par_ee",check_toc=TRUE)
dt<-get_eurostat_bulk("avia_par_ee",select_freq="A",verbose=TRUE)
options(restatapi_update=FALSE)
dt<-get_eurostat_bulk("agr_r_milkpr",cache_dir=tempdir(),compress_file=FALSE,verbose=TRUE)
# }
Run the code above in your browser using DataLab