normTable: Normalise data tables

Description

Harmonise and integrate data tables into standardised format

Usage

normTable(
  input = NULL,
  ...,
  source = "tabID",
  pattern = NULL,
  update = FALSE,
  keepOrig = FALSE,
  outType = "rds",
  verbose = FALSE
)

Arguments

input

[character(1)] path of the file to normalise. If this is left empty, all files at stage two as subset by pattern are chosen

...

[list(.)] matching lists that capture the variables by which to match and the new column names containing the resulting ID; see Details.

source

[charcter(1)] the source from which translations of terms should be sought. By default the recent "tabID", but when the same terms occur in several tables of a dataseries, chose "datID".

pattern

[character(1)] an optional regular expression. Only dataset names which match the regular expression will be returned.

update

[logical(1)] whether or not the physical files should be updated (TRUE) or the function should merely return the new object (FALSE, default). This is helpful to check whether the metadata specification and the provided file(s) (translation and ID tables) are properly specified.

keepOrig

[logical(1)] to keep the original units and variable names in the output (TRUE) or to remove them (FALSE, default). Useful for debugging.

outType

[logical(1)] the output file-type, currently implemented options are either *.csv (more exchangeable for a workflow based on several programs) or *.rds (smaller and less error-prone data-format but can only be read by R efficiently).

verbose

[logical(1)] be verbose about translating terms (default FALSE). Furthermore, you can use suppressMessages to make this function completely silent.

Value

This function harmonises and integrates so far unprocessed data tables at stage two into stage three of the areal database. It produces for each nation in the registered data tables a comma-separated values file that includes all thematic areal data.

Details

Arguments in ... are so-called matching lists. This argument captures three kinds of information:

the 'targetColumn' in that matching list that should be included in the final table in the place of 'variable' and
the 'targetID' (column name) of that new variable.

targetID = list(variable = targetColumn)

'variable' must be present as column in input and a table that is named "id_variable.csv" (where 'variable' is replaced by the variable name) must be available in the root directory of the project. This should have been created with setVariables.

To normalise data tables, this function proceeds as follows:

Read in input and extract initial metadata from the file name.
Employ the function tabshiftr::reorganise to reshape input according to the respective schema description.
Match the territorial units in input via matchUnits.
If ... has been provided with variables to match, those are matched via matchVars.
Harmonise territorial unit names.
If update = TRUE, store the processed data table at stage three.

Examples

Run this code

# NOT RUN {
# build the example database
makeExampleDB(until = "normGeometry")

# normalise all available data tables, harmonising commodities
# according to the FAO commodity list ...
normTable(faoID = list(commodities = "target"), update = TRUE)

# ... and check the result
output <- readRDS(paste0(tempdir(), "/newDB/adb_tables/stage3/Estonia.rds"))
# }