A parent class containing non-dataset specific methods.
origin
the origin of the data source. For regional data sources this will usually be the name of the country.
data
Once initialised, a list of named data frames: raw
(list of named raw data frames) clean (cleaned data) and processed
(processed data). Data is accessed using $data
.
supported_levels
A list of supported levels.
supported_region_names
A list of region names in order of level.
supported_region_codes
A list of region codes in order of level.
region_name
string Name for the region column, e.g. 'region'. This field is filled at initialisation with the region name for the specified level (supported_region_names$level).
code_name
string Name for the codes column, e.g. 'iso_3166_2' Filled at initialisation with the code name associated with the requested level (supported_region_codes$level).
codes_lookup
string or tibble Region codes for the target origin
filled by origin specific codes in
set_region_codes()
data_urls
List of named common and shared url links to raw data. Prefers shared if there is a name conflict.
common_data_urls
List of named links to raw data that are common across levels. The first entry should be named main.
level_data_urls
List of named links to raw data that are level
specific. Any urls that share a name with a url from
common_data_urls
will be selected preferentially. Each top level
list should be named after a supported level.
source_data_cols
existing columns within the raw data
level
target region level. This field is filled at initialisation
using user inputs or defaults in $new()
data_name
string. The country name followed by the level. E.g. "Italy at level 1"
totals
Boolean. If TRUE, returns totalled data per region
up to today's date. This field is filled at initialisation using user
inputs or defaults in $new()
localise
Boolean. Should region names be localised.
This field is filled at initialisation using user inputs or defaults
in $new()
verbose
Boolean. Display information at various stages.
This field is filled at initialisation. using user inputs or defaults
in $new()
steps
Boolean. Keep data from each processing step.
This field is filled at initialisation.using user inputs or defaults
in $new()
target_regions
A character vector of regions to filter for. Used
by the filter method
.
process_fns
array, additional, user supplied functions to process the data.
filter_level
Character The level of the data to filter at. Defaults to the target level.
set_region_codes()
Place holder for custom country specific function to load region codes.
DataClass$set_region_codes()
new()
Initialize function used by all DataClass
objects.
Set up the DataClass
class with attributes set to input parameters.
Should only be called by a DataClass
class object.
DataClass$new( level = "1", filter_level, regions, totals = FALSE, localise = TRUE, verbose = TRUE, steps = FALSE, get = FALSE, process_fns )
level
A character string indicating the target administrative level of the data with the default being "1". Currently supported options are level 1 ("1) and level 2 ("2").
filter_level
A character string indicating the level to filter at.
Defaults to the level of the data if not specified and if not otherwise
defined in the class.
Use get_available_datasets()
for supported options by dataset.
regions
A character vector of target regions to be assigned to
thetarget_regions
field if present.
totals
Logical, defaults to FALSE. If TRUE, returns totalled data per region up to today's date. If FALSE, returns the full dataset stratified by date and region.
localise
Logical, defaults to TRUE. Should region names be localised.
verbose
Logical, defaults to TRUE. Should verbose processing
steps
Logical, defaults to FALSE. Should all processing and cleaning steps be kept and output in a list.
get
Logical, defaults to FALSE. Should the class get
method be
called (this will download, clean, and process data at initialisation).
process_fns
Array, additional functions to process the data.
Users can supply their own functions here which would act on clean data
and they will be called alongside our default processing functions.
The default optional function added is set_negative_values_to_zero
.
if process_fns is not set (see process_fns
field for all defaults).
If you want to keep this when supplying your own processing functions
remember to add it to your list also. If you feel you have created a
cool processing function that others could benefit from please submit a
Pull Request to our github repository
and we will consider adding it to the package.
download()
Download raw data from data_urls
, stores a named list
of the data_url
name and the corresponding raw data table in
data$raw
DataClass$download()
download_JSON()
Download raw data from data_urls
, stores a named list
of the data_url
name and the corresponding raw data table in
data$raw
. Designed as a drop-in replacement for download
so
it can be used in sub-classes.
DataClass$download_JSON()
clean()
Cleans raw data (corrects format, converts column types,
etc). Works on raw data and so should be called after
download()
Calls the specific class specific cleaning method (clean_common
)
followed by level specific cleaning methods.
clean_level_[1/2]
. Cleaned data is stored in data$clean
DataClass$clean()
clean_common()
Cleaning methods that are common across a class.
By default this method is empty as if any code is required it should be
defined in a child class specific clean_common
method.
DataClass$clean_common()
available_regions()
Show regions that are available to be used for
filtering operations. Can only be called once clean()
has been
called. Filtering level is determined by checking the filter_level
field.
DataClass$available_regions(level)
level
A character string indicating the level to filter at.
Defaults to using the filter_level
field if not specified
filter()
Filter cleaned data for a specific region To be called
after clean()
DataClass$filter(regions, level)
regions
A character vector of target regions. Overrides the
current class setting for target_regions
.
level
Character The level of the data to filter at. Defaults to the lowest level in the data.
process()
Processes data by adding and calculating absent columns.
Called on clean data (after clean()
).
Some countries may have data as new events (e.g. number of
new cases for that day) whilst others have a running total up to that
date. Processing calculates these based on what the data comes with
via the functions region_dispatch()
and process_internal()
,
which does the following:
Adds columns not present in the data add_extra_na_cols()
Ensures there are no negative values
set_negative_values_to_zero()
Removes NA dates fill_empty_dates_with_na()
Calculates cumulative data complete_cumulative_columns()
Calculates missing columns from existing ones
calculate_columns_from_existing_data()
DataClass$process(process_fns)
process_fns
Array, additional functions to process the data.
Users can supply their own functions here which would act on clean data
and they will be called alongside our default processing functions.
The default optional function added is set_negative_values_to_zero
.
if process_fns is not set (see process_fns
field for all defaults).
get()
Get data related to the data class. This runs each distinct
step in the workflow in order.
Internally calls download()
,
clean()
,
filter()
and
process()
download
, clean
, filter
and process
methods.
DataClass$get()
return()
Return data. Designed to be called after
process()
this uses the steps argument to return either a
list of all the data preserved at each step or just the processed data.
For most datasets a custom method should not be needed.
DataClass$return()
summary()
Create a table of summary information for the data set being processed.
DataClass$summary()
Returns a single row summary tibble containing the origin of the data source, class, level 1 and 2 region names, the type of data, the urls of the raw data and the columns present in the raw data.
test()
Run tests on a country class instance. Calling test()
on a
class instance runs tests with the settings in use. For example, if you
set level = "1"
and localise = FALSE
the tests will be run on level 1
data which is not localised. Rather than downloading data for a test
users can provide a path to a snapshot file of data to test instead.
Tests are run on a clone of the class. This method calls generic tests
for all country class objects. It also calls country specific tests
which can be defined in an individual country class method called
specific_tests()
. The snapshots contain the first 1000 rows of data.
For more details see the
'testing' vignette: vignette(testing)
.
DataClass$test( download = FALSE, snapshot_dir = paste0(tempdir(), "/snapshots"), all = FALSE, ... )
download
logical. To download the data (TRUE) or use a snapshot (FALSE). Defaults to FALSE.
snapshot_dir
character_array the name of a directory to save the
downloaded data or read from. If not defined a directory called
'snapshots' will be created in the temp directory. Snapshots are saved as
rds files with the class name and level: e.g. Italy_level_1.rds
.
all
logical. Run tests with all settings (TRUE) or with those defined in the current class instance (FALSE). Defaults to FALSE.
...
Additional parameters to pass to specific_tests
clone()
The objects of this class are cloneable with this method.
DataClass$clone(deep = FALSE)
deep
Whether to make a deep clone.
All data sets have shared methods for extracting geographic codes, downloading, processing, and returning data. These functions are contained within this parent class and so are accessible by all data sets which inherit from here. Individual data sets can overwrite any functions or fields providing they define a method with the same name, and can be extended with additional functionality. See the individual method documentaion for further details.
Data interface functions
CountryDataClass
,
get_available_datasets()
,
get_national_data()
,
get_regional_data()
,
initialise_dataclass()