get_eurostat: Read Eurostat Data

Description

Download data sets from Eurostat ec.europa.eu/eurostat.

Usage

get_eurostat(id, time_format = "date", filters = "none", type = "code",
  select_time = NULL, cache = TRUE, update_cache = FALSE,
  cache_dir = NULL, compress_file = TRUE,
  stringsAsFactors = default.stringsAsFactors(), keepFlags = FALSE, ...)

Arguments

A code name for the dataset of interest. See search_eurostat or details for how to get code.

time_format

a string giving a type of the conversion of the time column from the eurostat format. A "date" (default) convers to a Date with a first date of the period. A "date_last" convers to a Date with a last date of the period. A "num" convers to a numeric and "raw" does not do conversion. See eurotime2date and eurotime2num.

filters

a "none" (default) to get a whole dataset or a named list of filters to get just part of the table. Names of list objects are Eurostat variable codes and values are vectors of observation codes. If NULL the whole dataset is returned via API. More on details. See more on filters and limitations per query via API from for get_eurostat_json.

type

A type of variables, "code" (default) or "label".

select_time

a character symbol for a time frequence or NULL, which is used by default as most datasets have just one time frequency. For datasets with multiple time frequencies, select the desired time format with: Y = annual, S = semi-annual, Q = quarterly, M = monthly. For all frequencies in same data frame time_format = "raw" should be used.

cache

a logical whether to do caching. Default is TRUE. Affects only queries from the bulk download facility.

update_cache

a locigal whether to update cache. Can be set also with options(eurostat_update = TRUE)

cache_dir

a path to a cache directory. The directory have to exist. The NULL (default) uses and creates 'eurostat' directory in the temporary directory from tempdir. Directory can also be set with option eurostat_cache_dir.

compress_file

a logical whether to compress the RDS-file in caching. Default is TRUE.

stringsAsFactors

if TRUE (the default) variables are converted to factors in original Eurostat order. If FALSE they are returned as a character.

keepFlags

a logical whether the flags (e.g. "confidential", "provisional") should be kept in a separate column or if they can be removed. Default is FALSE. For flag values see: http://ec.europa.eu/eurostat/data/database/information. Also possible non-real zero "0n" is indicated in flags column. Flags are not available for eurostat API, so keepFlags can not be used with a filters.

...

further argument for get_eurostat_json.

Value

a tibble. One column for each dimension in the data, the time column for a time dimension and the values column for numerical values. Eurostat data does not include all missing values and a treatment of missing values depend on source. In bulk download facility missing values are dropped if all dimensions are missing on particular time. In JSON API missing values are dropped only if all dimensions are missing on all times. The data from bulk download facility can be completed for example with complete.

Details

Data sets are downloaded from the Eurostat bulk download facility or from The Eurostat Web Services JSON API. If only the table id is given, the whole table is downloaded from the bulk download facility. If also filters are defined the JSON API is used.

The bulk download facility is the fastest method to download whole datasets. It is also often the only way as the JSON API has limitation of maximum 50 sub-indicators at time and whole datasets usually exceeds that. Also, it seems that multi frequency datasets can only be retrived via bulk download facility and the select_time is not available for JSON API method.

By default datasets from the bulk download facility are cached as they are often rather large. Caching is not (currently) possible for datasets from JSON API. Cache files are stored in a temporary directory by default or in a named directory if cache_dir or option eurostat_cache_dir is defined. The cache can be emptied with clean_eurostat_cache.

The id, a code, for the dataset can be searched with the search_eurostat or from the Eurostat database http://ec.europa.eu/eurostat/data/database. The Eurostat database gives codes in the Data Navigation Tree after every dataset in parenthesis.

Examples

Run this code

# NOT RUN {
k <- get_eurostat("nama_10_lp_ulc")
k <- get_eurostat("nama_10_lp_ulc", time_format = "num")
k <- get_eurostat("nama_10_lp_ulc", update_cache = TRUE)
dir.create(file.path(tempdir(), "r_cache"))
k <- get_eurostat("nama_10_lp_ulc", 
                  cache_dir = file.path(tempdir(), "r_cache"))
options(eurostat_update = TRUE)
k <- get_eurostat("nama_10_lp_ulc")
options(eurostat_update = FALSE)
options(eurostat_cache_dir = file.path(tempdir(), "r_cache"))
k <- get_eurostat("nama_10_lp_ulc")
k <- get_eurostat("nama_10_lp_ulc", cache = FALSE)
k <- get_eurostat("avia_gonc", select_time = "Y", cache = FALSE)

dd <- get_eurostat("nama_10_gdp", 
                     filters = list(geo = "FI", 
                                    na_item = "B1GQ", 
                                    unit = "CLV_I10"))
# }