DataClass: R6 Class containing non-dataset specific methods

Description

A parent class containing non-dataset specific methods.

Arguments

Public fields

origin: the origin of the data source. For regional data sources this will usually be the name of the country.
data: Once initialised, a list of named data frames: raw (list of named raw data frames) clean (cleaned data) and processed (processed data). Data is accessed using $data.
supported_levels: A list of supported levels.
supported_region_names: A list of region names in order of level.
supported_region_codes: A list of region codes in order of level.
region_name: string Name for the region column, e.g. 'region'. This field is filled at initialisation with the region name for the specified level (supported_region_names$level).
code_name: string Name for the codes column, e.g. 'iso_3166_2' Filled at initialisation with the code name associated with the requested level (supported_region_codes$level).
codes_lookup: string or tibble Region codes for the target origin filled by origin specific codes in set_region_codes()
data_urls: List of named common and shared url links to raw data. Prefers shared if there is a name conflict.
common_data_urls: List of named links to raw data that are common across levels. The first entry should be named main.
level_data_urls: List of named links to raw data that are level specific. Any urls that share a name with a url from common_data_urls will be selected preferentially. Each top level list should be named after a supported level.
source_data_cols: existing columns within the raw data
level: target region level. This field is filled at initialisation using user inputs or defaults in $new()
data_name: string. The country name followed by the level. E.g. "Italy at level 1"
totals: Boolean. If TRUE, returns totalled data per region up to today's date. This field is filled at initialisation using user inputs or defaults in $new()
localise: Boolean. Should region names be localised. This field is filled at initialisation using user inputs or defaults in $new()
verbose: Boolean. Display information at various stages. This field is filled at initialisation. using user inputs or defaults in $new()
steps: Boolean. Keep data from each processing step. This field is filled at initialisation.using user inputs or defaults in $new()
target_regions: A character vector of regions to filter for. Used by the filter method.
process_fns: array, additional, user supplied functions to process the data.
filter_level: Character The level of the data to filter at. Defaults to the target level.

Methods

Public methods

Method `set_region_codes()`

Place holder for custom country specific function to load region codes.

Usage

DataClass$set_region_codes()

Method `new()`

Initialize function used by all DataClass objects. Set up the DataClass class with attributes set to input parameters. Should only be called by a DataClass class object.

Usage

DataClass$new(
  level = "1",
  filter_level,
  regions,
  totals = FALSE,
  localise = TRUE,
  verbose = TRUE,
  steps = FALSE,
  get = FALSE,
  process_fns
)

Arguments

level: A character string indicating the target administrative level of the data with the default being "1". Currently supported options are level 1 ("1) and level 2 ("2").

filter_level

A character string indicating the level to filter at. Defaults to the level of the data if not specified and if not otherwise defined in the class. Use get_available_datasets() for supported options by dataset.

regions

A character vector of target regions to be assigned to thetarget_regions field if present.

totals

Logical, defaults to FALSE. If TRUE, returns totalled data per region up to today's date. If FALSE, returns the full dataset stratified by date and region.

localise

Logical, defaults to TRUE. Should region names be localised.

verbose

Logical, defaults to TRUE. Should verbose processing

steps

Logical, defaults to FALSE. Should all processing and cleaning steps be kept and output in a list.

get

Logical, defaults to FALSE. Should the class get method be called (this will download, clean, and process data at initialisation).

process_fns

Array, additional functions to process the data. Users can supply their own functions here which would act on clean data and they will be called alongside our default processing functions. The default optional function added is set_negative_values_to_zero. if process_fns is not set (see process_fns field for all defaults). If you want to keep this when supplying your own processing functions remember to add it to your list also. If you feel you have created a cool processing function that others could benefit from please submit a Pull Request to our github repository and we will consider adding it to the package.

Method `download()`

Download raw data from data_urls, stores a named list of the data_url name and the corresponding raw data table in data$raw

Usage

DataClass$download()

Method `download_JSON()`

Download raw data from data_urls, stores a named list of the data_url name and the corresponding raw data table in data$raw. Designed as a drop-in replacement for download so it can be used in sub-classes.

Usage

DataClass$download_JSON()

Method `clean()`

Cleans raw data (corrects format, converts column types, etc). Works on raw data and so should be called after download() Calls the specific class specific cleaning method (clean_common) followed by level specific cleaning methods. clean_level_[1/2]. Cleaned data is stored in data$clean

Usage

DataClass$clean()

Method `clean_common()`

Cleaning methods that are common across a class. By default this method is empty as if any code is required it should be defined in a child class specific clean_common method.

Usage

DataClass$clean_common()

Method `available_regions()`

Show regions that are available to be used for filtering operations. Can only be called once clean() has been called. Filtering level is determined by checking the filter_level field.

Usage

DataClass$available_regions(level)

Arguments

level: A character string indicating the level to filter at. Defaults to using the filter_level field if not specified

Method `filter()`

Filter cleaned data for a specific region To be called after clean()

Usage

DataClass$filter(regions, level)

Arguments

regions: A character vector of target regions. Overrides the current class setting for target_regions.

level

Character The level of the data to filter at. Defaults to the lowest level in the data.

Method `process()`

Processes data by adding and calculating absent columns. Called on clean data (after clean()). Some countries may have data as new events (e.g. number of new cases for that day) whilst others have a running total up to that date. Processing calculates these based on what the data comes with via the functions region_dispatch() and process_internal(), which does the following:

Adds columns not present in the data add_extra_na_cols()
Ensures there are no negative values set_negative_values_to_zero()
Removes NA dates fill_empty_dates_with_na()
Calculates cumulative data complete_cumulative_columns()
Calculates missing columns from existing ones calculate_columns_from_existing_data()

Usage

DataClass$process(process_fns)

Arguments

process_fns: Array, additional functions to process the data. Users can supply their own functions here which would act on clean data and they will be called alongside our default processing functions. The default optional function added is set_negative_values_to_zero. if process_fns is not set (see process_fns field for all defaults).

Method `get()`

Get data related to the data class. This runs each distinct step in the workflow in order. Internally calls download(), clean(), filter() and process() download, clean, filter and process methods.

Usage

DataClass$get()

Method `return()`

Return data. Designed to be called after process() this uses the steps argument to return either a list of all the data preserved at each step or just the processed data. For most datasets a custom method should not be needed.

Usage

DataClass$return()

Method `summary()`

Create a table of summary information for the data set being processed.

Usage

DataClass$summary()

Returns

Returns a single row summary tibble containing the origin of the data source, class, level 1 and 2 region names, the type of data, the urls of the raw data and the columns present in the raw data.

Method `test()`

Run tests on a country class instance. Calling test() on a class instance runs tests with the settings in use. For example, if you set level = "1" and localise = FALSE the tests will be run on level 1 data which is not localised. Rather than downloading data for a test users can provide a path to a snapshot file of data to test instead. Tests are run on a clone of the class. This method calls generic tests for all country class objects. It also calls country specific tests which can be defined in an individual country class method called specific_tests(). The snapshots contain the first 1000 rows of data. For more details see the 'testing' vignette: vignette(testing).

Usage

DataClass$test(
  download = FALSE,
  snapshot_dir = paste0(tempdir(), "/snapshots"),
  all = FALSE,
  ...
)

Arguments

download: logical. To download the data (TRUE) or use a snapshot (FALSE). Defaults to FALSE.

snapshot_dir

character_array the name of a directory to save the downloaded data or read from. If not defined a directory called 'snapshots' will be created in the temp directory. Snapshots are saved as rds files with the class name and level: e.g. Italy_level_1.rds.

all

logical. Run tests with all settings (TRUE) or with those defined in the current class instance (FALSE). Defaults to FALSE.

...

Additional parameters to pass to specific_tests

Method `clone()`

The objects of this class are cloneable with this method.

Usage

DataClass$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Details

All data sets have shared methods for extracting geographic codes, downloading, processing, and returning data. These functions are contained within this parent class and so are accessible by all data sets which inherit from here. Individual data sets can overwrite any functions or fields providing they define a method with the same name, and can be extended with additional functionality. See the individual method documentaion for further details.

Description

Arguments

Public fields

Methods

Public methods

Method set_region_codes()

Usage

Method new()

Usage

Arguments

Method download()

Usage

Method download_JSON()

Usage

Method clean()

Usage

Method clean_common()

Usage

Method available_regions()

Usage

Arguments

Method filter()

Usage

Arguments

Method process()

Usage

Arguments

Method get()

Usage

Method return()

Usage

Method summary()

Usage

Returns

Method test()

Usage

Arguments

Method clone()

Usage

Arguments

Details

See Also

Method `set_region_codes()`

Method `new()`

Method `download()`

Method `download_JSON()`

Method `clean()`

Method `clean_common()`

Method `available_regions()`

Method `filter()`

Method `process()`

Method `get()`

Method `return()`

Method `summary()`

Method `test()`

Method `clone()`