Download data sets from Eurostat database .
get_eurostat_raw(id, mode = "txt", cache = TRUE,
update_cache = FALSE, cache_dir = NULL, compress_file = TRUE,
stringsAsFactors = default.stringsAsFactors(), keep_flags = FALSE,
check_toc = FALSE, verbose = FALSE, ...)
A code name for the dataset of interest.
See search_eurostat_toc
for details how to get an id.
defines the format of the downloaded dataset. It can be txt
(the default value) for
Tab Separated Values (TSV), or xml
for the SDMX version.
a logical whether to do caching. Default is TRUE
.
a logical with a default value FALSE
, whether to update cache. Can be set also with
options(restatapi_update=TRUE)
a path to a cache directory. The NULL
(default) uses the memory as cache.
If the folder if the cache_dir
directory does not exist it saves in the 'restatapi' directory
under the temporary directory from tempdir()
. Directory can also be set with
option(restatapi_cache_dir=...)
.
a logical whether to compress the
RDS-file in caching. Default is TRUE
.
if TRUE
(the default) variables are not numeric then they are
converted to factors. If the value FALSE
they are returned as a characters.
a logical whether the observation status (flags) - e.g. "confidential",
"provisional", etc. - should be kept in a separate column or if they
can be removed. Default is FALSE
. For flag values see:
http://ec.europa.eu/eurostat/data/database/information.
a boolean whether to check the provided id
in the Table of Contents (TOC) or not. The default value
FALSE
, in this case the base URL for the download link is retrieved from the configuration file.
If the value is TRUE
then the TOC is downloaded and the id
is checked in it. If it found then the download link
is retrieved form the TOC.
A boolean with default FALSE
, so detailed messages (for debugging) will not printed.
Can be set also with options(restatapi_verbose=TRUE)
further argument for the load_cfg
function
a data.table with the following columns:
FREQ |
The frequency of the data (Annual, Semi-annual, Half-year, Quarterly, Monthly, Weekly, Daily) |
dimension names | One column for each dimension in the data |
TIME_FORMAT |
A column for the time format, if the source file SDMX and the data was not loaded from a previously cached TSV download (this column is missing if the source file is TSV) |
time/TIME_PERIOD |
A column for the time dimension, where the name of the column depends on the source file (TSV/SDMX) |
values/OBS_VALUE |
A column for numerical values, where the name of the column depends on the source file (TSV/SDMX) |
The data does not include all missing values. The missing values are dropped if the value and flags are missing on a particular time.
Data sets are downloaded from the Eurostat bulk download facility in TSV or SDMX format.
The id
, should be a value from the code
column of the table of contents (get_eurostat_toc
), and can be searched for with the search_eurostat_toc
function. The id value can be retrieved from the Eurostat database
as well. The Eurostat database gives codes in the Data Navigation Tree after every dataset in parenthesis.
By default all datasets downloaded in TSV format and cached as they are often rather large.
The datasets cached in memory (default) or can be stored in a temporary directory if cache_dir
or option(restatpi_cache_dir)
is defined.
The cache can be emptied with clean_restatapi_cache
.
If the id
is checked in TOC then the data will saved in the cache with the date from the "lastUpdate" column from the TOC, otherwise it is saved with the current date.
# NOT RUN {
dt<-get_eurostat_raw("agr_r_milkpr",keep_flags=TRUE)
dt<-get_eurostat_raw("avia_par_ee",mode="xml",check_toc=TRUE,update_cache=TRUE)
options(restatapi_update=FALSE)
dt<-get_eurostat_raw("avia_par_me",mode="txt",cache_dir=tempdir(),compress_file=FALSE,verbose=TRUE)
# }
Run the code above in your browser using DataLab