Learn R Programming

⚠️There's a newer version (0.9.3) of this package.Take me there.

Subnational data for the Covid-19 outbreak

An interface to subnational and national level Covid-19 data. For all countries supported, this includes a daily time-series of cases. Wherever available we also provide data on deaths, hospitalisations, and tests. National level data is also supported using a range of data sources as well as linelist data and links to intervention data sets.

Installation

Install from CRAN:

install.packages("covidregionaldata")

Install the stable development version of the package with:

install.packages("drat")
drat:::add("epiforecasts")
install.packages("covidregionaldata")

Install the unstable development version of the package with:

remotes::install_github("epiforecasts/covidregionaldata")

Quick start

To get worldwide time-series data by country (sourced from the ECDC), use get_national_data():

covidregionaldata::get_national_data()

To get time-series data for subnational regions of a specific country, for example by local authority in the UK, use get_regional_data():

covidregionaldata::get_regional_data(country = "UK", include_level_2_regions = TRUE)

Subnational regions are only available in some countries. See below section 4, "Coverage".

Usage

Worldwide data

Accessing national data

Both the World Health Organisation (WHO) and European Centre for Disease Control (ECDC) provide worldwide national data. Access national level data for any country using:

covidregionaldata::get_national_data()

This returns daily new and cumulative (total) cases, and where available, deaths, hospitalisations, and tests. For a complete list of variables returned, see section 5, "Data glossary" below.

This function takes 3 optional arguments:

  • country (optional) - a country name (in any language) for which to return national level data. This argument permits any country in the United Nations and reported by the specified data source (ECDC or WHO). If not specified, all countries will be returned.

  • source (optional, default is "ECDC") - the data source for national data. Either "ECDC" or "WHO".

  • totals (optional, default is FALSE) - a Boolean (TRUE/FALSE), denoting whether the data returned should be a table of total counts (one row per country) or time series data (one row per country/date combination).

This returns data in the same structure as get_regional_data(). This means there are no gaps in the structure of the data by country over time, and NAs fill in where data are not available.

Accessing national government interventions

A further function for worldwide data extracts non-pharmaceutical interventions by country:

  • covidregionaldata::get_interventions_data()

Accessing a patient linelist

Patient linelist data is useful for exploring delays and lags in reporting. An anonymised international patient linelist can be imported and cleaned with:

  • covidregionaldata::get_linelist()

Sub-national time-series data

Accessing sub-national data

Access sub-national level data for a specific country over time by using covidregionaldata::get_regional_data().

This returns daily new and cumulative (total) cases, and where available, deaths, hospitalisations, and tests. For a complete list of variables returned, see section 5, "Data glossary" below.

The function takes 3 arguments:

  • country - the English name of the country of interest. Not case sensitive
  • totals (optional, default is FALSE) - a Boolean (TRUE/FALSE), denoting whether the data returned should be a table of total counts (one row per region) or time series data (one row per region/date combination).
  • include_level_2_regions (optional, default is FALSE) - a Boolean (TRUE/FALSE), denoting whether the data returned should be stratified by admin level 1 region (usually the largest subregion available) or admin level 2 region (usually the second largest).

This returns a dataset with one row for each region for each date. For all regions, dates span from the first date until the last date that data are available for any region in the country. This means there are no gaps in the structure of the data, although NAs fill in where data are not available.

For example, data for Belgium Level 1 regions over time can be accessed using:

get_regional_data(country = "Belgium")

This returns a dataset in this format:

dateregioniso_codecases_newcases_totaldeaths_newdeaths_totalrecovered_newrecovered_totalhosp_newhosp_totaltested_newtested_total
2020-05-24WalloniaBE-WAL2418196163251NANA85126NANA
2020-05-25BrusselsBE-BRU26583821421NANA62533NANA
2020-05-25FlandersBE-VLG18332381144681NANA299334NANA

Level 1 and Level 2 regions

All countries included in the package (see below,"Coverage") have data for regions at the admin-1 level, the largest administrative unit of the country (e.g. state in the USA). Some countries also have data for smaller areas at the admin-2 level (e.g. county in the USA).

Data for Level 2 units can be returned by using the include_level_2_regions = TRUE argument. The dataset will still show the corresponding Level 1 region.

An example of a country with Level 2 units is Belgium, where Level 2 units are Belgian provinces:

covidregionaldata::get_regional_data("Belgium", include_level_2_regions = TRUE)

This returns a dataset with the format:

dateprovincelevel_2_region_coderegioniso_codecases_newcases_totaldeaths_newdeaths_totalrecovered_newrecovered_totalhosp_newhosp_totaltested_newtested_total
2020-05-24BrusselsBE-BRUBrusselsBE-BRU75812NANANANA42527NANA
2020-05-24AntwerpenBE-VANFlandersBE-VLG167905NANANANA52510NANA
2020-05-24LimburgBE-VLIFlandersBE-VLG146126NANANANA21848NANA

Totals

For totalled data up to the most recent date available, use the totals argument.

covidregionaldata::get_regional_data("Belgium", totals = TRUE)

This returns a dataset with one row for each region, in the same format:

regioniso_codecases_totaldeaths_totalrecovered_totalhosp_totaltested_total
FlandersBE-VLG341954878096940
WalloniaBE-WAL190933362053210
BrusselsBE-BRU62291482026570

Sub-national coverage

We include sub-national data in the following countries. These are the accepted country names when using get_regional_data(country = "").

ContinentCountryLevel 1Level 2
EuropeBelgiumRegionProvince
EuropeGermanyBundeslandLandkreis
EuropeUKNHS regionLocal authority
EuropeItalyRegionNA
EuropeRussiaRegionNA
AmericasBrazilStateCity
AmericasUSAStateCounty
AmericasCanadaProvinceNA
AmericasColombiaDepartmentNA
AsiaAfghanistanProvinceNA
AsiaIndiaStatesNA

We are hoping to expand over time (see below "Development").

Data glossary

Subnational data

The data columns that will be returned by get_regional_data() are listed below.

To standardise across countries and regions, the columns returned for each country will always be the same. If the corresponding data was missing from the original source then that data field is filled with NA values (or 0 if accessing totals data).

Note that Date is not included if the totals argument is set to TRUE. Level 2 region/level 2 region code are not included if the include_level_2_regions argument is set to FALSE.

  • date: the date that the counts were reported (YYYY-MM-DD).
  • level 1 region: the level 1 region name. This column will be named differently for different countries (e.g. state, province).
  • level 1 region code: a standard code for the level 1 region. The column name reflects the specific administrative code used. Typically data returns the iso_3166_2 standard, although where not available the column will be named differently to reflect its source.
  • level 2 region: the level 2 region name. This column will be named differently for different countries (e.g. city, county).
  • level 2 region code: a standard code for the level 2 region. The column will be named differently for different countries (e.g. fips in the USA).
  • cases_new: new reported cases for that day
  • cases_total: total reported cases up to and including that day
  • deaths_new: new reported deaths for that day
  • deaths_total: total reported deaths up to and including that day
  • recovered_new: new reported recoveries for that day
  • recovered_total: total reported recoveries up to and including that day
  • hosp_new: new reported hospitalisations for that day
  • hosp_total: total reported hospitalisations up to and including that day (note this is cumulative total of new reported, not total currently in hospital)
  • tested_new: tests for that day
  • tested_total: total tests completed up to and including that day

The exception to this is data for the UK. This is in its raw state, as regions have separate and sometimes incompatible data reporting.

National data

In addition to the above, the following columns are included when using get_national_data().

  • un_region: country geographical region defined by the United Nations.
  • who_region: only included when source = "WHO". Country geographical region defined by WHO.
  • population_2019: only included when source = "ECDC" (the default). Total country population estimate in 2019.

Development

We welcome contributions and new contributors! We particularly appreciate help adding new data sources for countries at sub-national level, or work on priority problems in the issues. Please check and add to the issues, and/or add a pull request. For more detail, please read the System Maintenance Guide.

Copy Link

Version

Install

install.packages('covidregionaldata')

Monthly Downloads

74

Version

0.8.2

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Sam Abbott

Last Published

December 12th, 2020

Functions in covidregionaldata (0.8.2)

fill_empty_dates_with_na

Add rows of NAs for dates where a region does not have any data
get_afghan_region_codes

Afghan region codes
get_brazil_regional_cases_only_level_1

Brazilian Regional Daily COVID-19 Count Data - States
get_brazil_regional_cases_with_level_2

Brazilian Regional Daily COVID-19 Count Data - Cities
get_cumulative_from_daily

Cumulative counts from daily counts.
get_afghan_regional_cases

Afghan Regional Daily COVID-19 Count Data
get_belgium_regional_cases_only_level_1

Belgian Regional Daily COVID-19 Count Data - Regions Only
get_belgium_regional_cases_with_level_2

Belgian Provincial Daily COVID-19 Count Data - Regions and Provinces
get_daily_from_cumulative

Daily counts from data that is in cumulative form.
get_italy_regional_cases

Regional Daily COVID-19 Count Data
get_level_2_region_codes

Get a table of level 2 region codes (FIPS, ONS, region) for a specified country
get_germany_level_2_codes

German level 2 codes (not available currently)
get_region_codes

Get a table of region codes for a specified country
get_ecdc_cases

ECDC International Case Counts: works within get_national_data
add_extra_na_cols

Add extra columns filled with NA values to a dataset.
calculate_columns_from_existing_data

Cumulative counts from daily counts or daily counts from cumulative, dependent on which columns already exist
get_uk_regional_cases_with_level_2

UK Regional Daily COVID-19 Count Data - UTLA
get_belgium_level_2_codes

Belgian Provincial region codes
get_regional_data

The main calculation function for covidregionaldata. The majority of the work is done in this function.
get_canada_region_codes

Canadian region codes
get_authority_lookup_table

Lookup table for local authority structure for the UK
get_canada_regional_cases

Canadian Regional Daily COVID-19 Count Data - Provinces
get_us_level_2_codes

US level 2 codes (FIPS) (Included in original function)
get_italy_region_codes

Italian region codes
get_interventions_data

Import ACAPS Government Interventions dataset
get_belgium_region_codes

Belgian region codes
get_germany_regional_cases_with_level_2

German Regional Daily COVID-19 Count Data - Landkreis
get_uk_data

Get UK data - helper function to get data for a single valid area type
get_uk_region_codes

UK region codes (NULL - they're in the raw data already)
refresh_covidregionaldata_canada

Get daily Canada COVID-19 count data by Province/Territory
get_india_regional_cases

Indian Regional Daily COVID-19 Count Data - State
get_india_region_codes

Indian region codes
get_uk_level_2_codes

UK level 2 codes (ONS) (Included in original function)
csv_reader

Custom CSV reading function
convert_to_covid19R_format

Convert data to Covid19R package data standard
get_colombia_region_codes

Colombia region codes
complete_cumulative_columns

Completes cumulative columns if rows were added with NAs.
get_brazil_level_2_codes

Brazilian level 2 codes (not available currently)
check_data_sources

Check data sources
get_colombia_regional_cases

Colombian Regional Daily COVID-19 Count Data - Department
refresh_covidregionaldata_germany

Get daily German COVID-19 count data by State (Bundesland)
refresh_covidregionaldata_india

Get daily Indian COVID-19 count data by State/Unified Territory
get_linelist

Get Linelist Data
get_uk_regional_cases_only_level_1

UK Regional Daily COVID-19 Count Data - Region
refresh_covidregionaldata_belgium

Get daily Belgian COVID-19 count data by Region
rename_region_code_column

Helper to rename the region code column in each dataset to the correct code type for each country (e.g. ISO-3166-2).
get_russia_regional_cases

Russian Regional Daily COVID-19 Count Data - Region
get_germany_regional_cases_only_level_1

German Regional Daily COVID-19 Count Data - Bundesland
get_us_regional_cases_with_level_2

US Regional Daily COVID-19 Count Data - Counties
get_russia_region_codes

Russian region codes
get_germany_region_codes

German region codes
get_brazil_region_codes

Brazilian region codes
refresh_covidregionaldata_uk

Get daily UK COVID-19 count data by EU-defined region
refresh_covidregionaldata_colombia

Get daily Colombian COVID-19 count data by Department (Departamento).
refresh_covidregionaldata_usa

Get daily USA COVID-19 count data by state.
refresh_covidregionaldata_italy

Get daily Italian COVID-19 count data by Region (Regioni).
refresh_covidregionaldata_russia

Get daily Russian COVID-19 count data by Russian region.
refresh_covidregionaldata_brazil

Get daily Brazilian COVID-19 count data by State (Estado)
get_info_covidregionaldata

Get meta information about the covidregionaldata refresh_* data getters.
get_us_regional_cases_only_level_1

US Regional Daily COVID-19 Count Data - States
set_negative_values_to_zero

Set negative data to 0
get_us_region_codes

US region codes
reset_cache

Reset Cache and Update all Local Data
get_who_cases

Download the most recent WHO case data
start_using_memoise

Add useMemoise to options
get_national_data

Get national-level data for countries globally, sourced from the ECDC or WHO.
left_join_region_codes

Custom left_join function
rename_region_column

Helper to rename the region column in each dataset to the correct name for each country.
totalise_data

Get totals data given the time series data.
stop_using_memoise

Stop using useMemoise
refresh_covidregionaldata_afghanistan

Get daily Afghan COVID-19 count data by Province (Wilayat)