Learn R Programming

Subnational data for the COVID-19 outbreak

Interface to subnational and national level COVID-19 data sourced from both official sources, such as Public Health England in the UK, and from other COVID-19 data collections, including the World Health Organisation (WHO), European Centre for Disease Prevention and Control (ECDC), John Hopkins University (JHU), Google Open Data and others. This package is designed to streamline COVID-19 data extraction, cleaning, and processing from a range of data sources in an open and transparent way. This allows users to inspect and scrutinise the data, and tools used to process it, at every step. For all countries supported, data includes a daily time-series of cases and, wherever available, data on deaths, hospitalisations, and tests. National level data is also supported using a range of data sources as well as line list data and links to intervention data sets.

Installation

Install from CRAN:

install.packages("covidregionaldata")

Install the stable development version of the package with:

install.packages("covidregionaldata",
  repos = "https://epiforecasts.r-universe.dev"
)

Install the unstable development version of the package with:

remotes::install_github("epiforecasts/covidregionaldata")

Quick start

Load covidregionaldata, dplyr, scales, and ggplot2 (all used in this quick start),

library(covidregionaldata)
library(dplyr)
library(ggplot2)
library(scales)

Setup data caching

This package can optionally use a data cache from memoise to locally cache downloads. This can be enabled using the following (this will use the temporary directory by default),

start_using_memoise()
#> Using a cache at: /tmp/RtmpPgZXiv

To stop using memoise use,

stop_using_memoise()

and to reset the cache (required to download new data),

reset_cache()

National data

To get worldwide time-series data by country (sourced from the World Health Organisation (WHO) by default but also optionally from the European Centre for Disease Control (ECDC), John Hopkins University, or the Google COVID-19 open data project), use:

nots <- get_national_data()
#> Downloading data from https://covid19.who.int/WHO-COVID-19-global-data.csv
#> Rows: 142911 Columns: 8
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr  (3): Country_code, Country, WHO_region
#> dbl  (4): New_cases, Cumulative_cases, New_deaths, Cumulative_deaths
#> date (1): Date_reported
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Cleaning data
#> Processing data
nots
#> # A tibble: 142,911 × 15
#>    date       un_region who_region country        iso_code cases_new cases_total
#>    <date>     <chr>     <chr>      <chr>          <chr>        <dbl>       <dbl>
#>  1 2020-01-03 Asia      EMRO       Afghanistan    AF               0           0
#>  2 2020-01-03 Europe    EURO       Albania        AL               0           0
#>  3 2020-01-03 Africa    AFRO       Algeria        DZ               0           0
#>  4 2020-01-03 Oceania   WPRO       American Samoa AS               0           0
#>  5 2020-01-03 Europe    EURO       Andorra        AD               0           0
#>  6 2020-01-03 Africa    AFRO       Angola         AO               0           0
#>  7 2020-01-03 Americas  AMRO       Anguilla       AI               0           0
#>  8 2020-01-03 Americas  AMRO       Antigua & Barbuda AG               0           0
#>  9 2020-01-03 Americas  AMRO       Argentina      AR               0           0
#> 10 2020-01-03 Asia      EURO       Armenia        AM               0           0
#> # … with 142,901 more rows, and 8 more variables: deaths_new <dbl>,
#> #   deaths_total <dbl>, recovered_new <dbl>, recovered_total <dbl>,
#> #   hosp_new <dbl>, hosp_total <dbl>, tested_new <dbl>, tested_total <dbl>

This can also be filtered for a country of interest,

g7 <- c(
  "United States", "United Kingdom", "France", "Germany",
  "Italy", "Canada", "Japan"
)
g7_nots <- get_national_data(countries = g7, verbose = FALSE)

Using this data we can compare case information between countries, for example here is the number of deaths over time for each country in the G7:

g7_nots %>%
  ggplot() +
  aes(x = date, y = deaths_new, col = country) +
  geom_line(alpha = 0.4) +
  labs(x = "Date", y = "Reported Covid-19 deaths") +
  scale_y_continuous(labels = comma) +
  theme_minimal() +
  theme(legend.position = "top") +
  guides(col = guide_legend(title = "Country"))

Subnational data

To get time-series data for subnational regions of a specific country, for example by level 1 region in the UK, use:

uk_nots <- get_regional_data(country = "UK", verbose = FALSE)
uk_nots
#> # A tibble: 7,501 × 26
#>    date       region   region_code cases_new cases_total deaths_new deaths_total
#>    <date>     <chr>    <chr>           <dbl>       <dbl>      <dbl>        <dbl>
#>  1 2020-01-30 East Mi… E12000004          NA          NA         NA           NA
#>  2 2020-01-30 East of… E12000006          NA          NA         NA           NA
#>  3 2020-01-30 England  E92000001           2           2         NA           NA
#>  4 2020-01-30 London   E12000007          NA          NA         NA           NA
#>  5 2020-01-30 North E… E12000001          NA          NA         NA           NA
#>  6 2020-01-30 North W… E12000002          NA          NA         NA           NA
#>  7 2020-01-30 Norther… N92000002          NA          NA         NA           NA
#>  8 2020-01-30 Scotland S92000003          NA          NA         NA           NA
#>  9 2020-01-30 South E… E12000008          NA          NA         NA           NA
#> 10 2020-01-30 South W… E12000009          NA          NA         NA           NA
#> # … with 7,491 more rows, and 19 more variables: recovered_new <dbl>,
#> #   recovered_total <dbl>, hosp_new <dbl>, hosp_total <dbl>, tested_new <dbl>,
#> #   tested_total <dbl>, areaType <chr>, cumCasesByPublishDate <dbl>,
#> #   cumCasesBySpecimenDate <dbl>, newCasesByPublishDate <dbl>,
#> #   newCasesBySpecimenDate <dbl>, cumDeaths28DaysByDeathDate <dbl>,
#> #   cumDeaths28DaysByPublishDate <dbl>, newDeaths28DaysByDeathDate <dbl>,
#> #   newDeaths28DaysByPublishDate <dbl>, …

Now we have the data we can create plots, for example the time-series of the number of cases for each region:

uk_nots %>%
  filter(!(region %in% "England")) %>%
  ggplot() +
  aes(x = date, y = cases_new, col = region) +
  geom_line(alpha = 0.4) +
  labs(x = "Date", y = "Reported Covid-19 cases") +
  scale_y_continuous(labels = comma) +
  theme_minimal() +
  theme(legend.position = "top") +
  guides(col = guide_legend(title = "Region"))

See get_available_datasets() for supported regions and subregional levels. To view what datasets we currently have subnationaldata for, along with their current status, check the supported countries page or build the supported countries vignette.

For further examples see the quick start vignette. Additional subnational data are supported via the JHU() and Google() classes. Use the available_regions() method once these data have been downloaded and cleaned (see their examples) for subnational data they internally support.

Citation

If using covidregionaldata in your work please consider citing it using the following,

#> 
#> To cite covidregionaldata in publications use:
#> 
#>   Joseph Palmer, Katharine Sherratt, Richard Martin-Nielsen, Jonnie
#>   Bevan, Hamish Gibbs, Sebastian Funk and Sam Abbott (2021).
#>   covidregionaldata: Subnational data for COVID-19 epidemiology, DOI:
#>   10.21105/joss.03290
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Article{,
#>     title = {covidregionaldata: Subnational data for COVID-19 epidemiology},
#>     author = {Joseph Palmer and Katharine Sherratt and Richard Martin-Nielsen and Jonnie Bevan and Hamish Gibbs and Sebastian Funk and Sam Abbott},
#>     journal = {Journal of Open Source Software},
#>     year = {2021},
#>     volume = {6},
#>     number = {63},
#>     pages = {3290},
#>     doi = {10.21105/joss.03290},
#>   }

Development

This package is the result of work from a number of contributors (see contributors list here). We would like to thank the CMMID COVID-19 working group for insightful comments and feedback.

We welcome contributions and new contributors! We particularly appreciate help adding new data sources for countries at sub-national level, or work on priority problems in the issues. Please check and add to the issues, and/or add a pull request. For more details, start with the contributing guide. For details of the steps required to add support for a dataset see the adding data guide.

Copy Link

Version

Install

install.packages('covidregionaldata')

Monthly Downloads

56

Version

0.9.3

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Sam Abbott

Last Published

February 7th, 2022

Functions in covidregionaldata (0.9.3)

Belgium

Belgium Class for downloading, cleaning and processing notification data
CountryDataClass

R6 Class containing national level methods
Cuba

Cuba Class for downloading, cleaning and processing notification data
DataClass

R6 Class containing non-dataset specific methods
Colombia

Colombia Class for downloading, cleaning and processing notification data
Canada

Canada Class containing origin specific attributes and methods
Brazil

Brazil Class for downloading, cleaning and processing notification data
Covid19DataHub

R6 Class containing specific attributes and methods for Covid19 Data Hub
ECDC

R6 Class containing specific attributes and methods for the European Centre for Disease Prevention and Control dataset
Italy

Italy Class for downloading, cleaning and processing notification data
Switzerland

Switzerland Class for downloading, cleaning and processing notification data
UK

United Kingdom Class for downloading, cleaning and processing notification data.
Estonia

Estonia Class for downloading, cleaning and processing notification data
Mexico

Meixco Class for downloading, cleaning and processing notification data
Lithuania

Lithuania Class for downloading, cleaning and processing notification data
JHU

R6 Class containing specific attributes and methods for John Hopkins University data
France

France Class containing origin specific attributes and methods
JHU_codes

Region Codes for JHU Dataset. Taken from the region codes provided as part of the WHO dataset.
csv_reader

Custom CSV reading function
Google

R6 Class containing specific attributes and methods for Google data
India

India Class for downloading, cleaning and processing notification data
download_excel

Download Excel Documents
get_national_data

Get national-level data for countries globally from a range of sources
get_linelist

Get patient line list data
USA

USA Class for downloading, cleaning and processing notification data
WHO

R6 Class containing specific attributes and methods for World Health Organisation data
complete_cumulative_columns

Completes cumulative columns if rows were added with NAs.
initialise_dataclass

Initialise a child class of DataClass if it exists
colombia_codes

Region Codes for Colombia Dataset.
JRC

R6 Class containing specific attributes and methods for European Commission's Joint Research Centre data
mexico_codes

Region Codes for Mexico Dataset.
process_internal

Internal Shared Regional Dataset Processing
Germany

Germany Class for downloading, cleaning and processing notification data
get_available_datasets

Get supported data sets
france_codes

Region Codes for France Dataset.
check_level

Checks a given level is supported
calculate_columns_from_existing_data

Cumulative counts from daily counts or daily counts from cumulative, dependent on which columns already exist
test_processing

Test process method works correctly
make_new_data_source

Create new country class for a given source
test_return

Test return method works correctly
get_regional_data

Get regional-level data
json_reader

Custom JSON reading function
glue_level

Glue the spatial level into a variable name
run_default_processing_fns

Default processing steps to run
expect_clean_cols

Test clean columns contain the correct data and types
SouthAfrica

SouthAfrica Class for downloading, cleaning and processing notification data
Netherlands

Netherlands Class for downloading, cleaning and processing notification data
expect_columns_contain_data

Test that cleaned columns contain data/
message_verbose

Wrapper for message
run_optional_processing_fns

Optional processing steps to run
reset_cache

Reset Cache and Update all Local Data
totalise_data

Get totals data given the time series data.
return_data

Control data return
test_download_JSON

Test download method for JSON files works correctly
test_download

Test download method works correctly
start_using_memoise

Add useMemoise to options
set_negative_values_to_zero

Set negative data to 0
vietnam_codes

Region Codes for Vietnam Dataset.
uk_codes

Region Codes for UK Dataset.
expect_processed_cols

Test that processed columns contain the correct data and types
add_extra_na_cols

Add extra columns filled with NA values to a dataset.
all_country_data

Table of available datasets along with level and other information. Rendered from the individual R6 class objects included in this package.
fill_empty_dates_with_na

Add rows of NAs for dates where a region does not have any data
lithuania_codes

Region Codes for Lithuania Dataset.
make_github_workflow

Create github action for a given source
reexports

Objects exported from other packages
region_dispatch

Control Grouping Variables used in process_internal
stop_using_memoise

Stop using useMemoise
test_cleaning

Test clean method works correctly