ConfigFileOpen: Functions To Create Open And Save Configuration File

Description

These functions help in creating, opening and saving configuration files.

Usage

ConfigFileOpen(file_path, silent = FALSE)
ConfigFileCreate(file_path, confirm = TRUE)
ConfigFileSave(configuration, file_path, confirm = TRUE)

Arguments

file_path

Path to the configuration file to create/open/save.

silent

Flag to activate or deactivate verbose mode. Defaults to FALSE (verbose mode on).

configuration

Configuration object to save in a file.

confirm

Flag to stipulate whether to ask for confirmation when saving a configuration file that already exists. Defaults to TRUE (confirmation asked).

Value

ConfigFileOpen() returns a configuration object with all the information for the configuration file mechanism to work. ConfigFileSave() returns TRUE if the file has been saved and FALSE otherwise. ConfigFileCreate() returns nothing.

cr

~~~~~~~~~~~~~~~~~~~~~~~~~~~~
It contains two lists and five tables. Each of these have a header that starts with '!!'. These are key lines and should not be removed or reordered. Lines starting with '#' and blank lines will be ignored.
The first list should contain all the supported 2-dimensional variables. The second list should contain all the supported global mean variables. Regular expressions[1] can be used in these lists to match more than one variable name.
The first table contains information about experiments (which are stored in a file per ensemble per starting date format). The second table contains information about file-per-member experiments (which are stored in a file per member per starting date format). The third table contains information about observations (which are stored in a file per month per ensemble format). The fourth table contains information about the file-per-member observations (which are stored in a file per month per member format). The fifth table contains information about the file-per-dataset observations (which contain all the dataset values in a single file).
Each table entry is a list of comma-separated elements. The two first are part of a key that is associated to a value formed by the other elements. The key elements are a dataset identifier and a variable name. The value elements are the dataset main path, dataset file path, grid (unless it is an observation table entry), the variable name inside the .nc file, a default suffix (explained below) and a minimum and maximum vaues beyond which loaded data is deactivated.
Given a dataset name and a variable name, a full path is obtained concatenating the main path and the file path with a slash (/) in between. Also the grid (in the case it is an experiment), the nc variable name, the suffixes and the limit values are obtained.
Any of the elements in the keys can contain regular expressions that will cause matching for sets of dataset names or variable names.
The dataset path and file path can contain shell globbing expressions[2] that will cause matching for sets of paths when fetching the file in the full path. The full path can point to an OPeNDAP URL.
Any of the elements in the value can contain variables that will be replaced to an associated string. Variables can be defined only immediately above any of the tables header. The pattern of a variable definition is VARIABLE_NAME = VARIABLE_VALUE and can be accessed from within the table values or from within the variable values as $VARIABLE_NAME$ For example: FILE_NAME = tos.nc !!table of experiments ecmwf, tos, /path/to/dataset, $FILE_NAME$ There are some reserved variables that will offer information about the store frequency, the current startdate Load() is fetching, etc: $START_DATE$, $STORE_FREQ$ for file-per-member datasets: $MEMBER_NUMBER$ for observations: $YEAR$, $MONTH$, $DAY$ Additionally, from an element in an entry value you can access the other elements of the entry as: $EXP_NAME$, $VAR_NAME$, $EXP_MAIN_PATH$, $EXP_FILE_PATH$, $GRID$, $VAR_NAME$, $SUFFIX$, $VAR_MIN$, $VAR_MAX$ and in an observation as: $OBS_NAME$, $VAR_NAME$, $OBS_MAIN_PATH$, $OBS_FILE_PATH$, $VAR_NAME$, $SUFFIX$, $VAR_MIN$, $VAR_MAX$
The variable $SUFFIX$ is useful because it can be used to take part in the main or file path. For example: '/path/to$SUFFIX$/dataset'. It will be replaced by the value in the column that corresponds to the suffix unless the user specifies a different suffix via the parameter 'suffixexp' or 'suffixobs'. This way the user is able to load two variables with the same name in the same dataset but with slight modifications, with a suffix anywhere in the path to the data that advices of this slight modification.
The entries in a table will be grouped in 4 levels of specificity:
1. General entries: · the key dataset name and variable name are both a regular expression matching any sequence of characters (.*) that will cause matching for any pair of dataset and variable names Example: .*, .*, /dataset/main/path, file/path, grid, nc_var_name, suffix, var_min, var_max
Dataset entries: · the key variable name matches any sequence of characters Example: ecmwf, .*, /dataset/main/path, file/path, grid, nc_var_name, suffix, var_min, var_max Variable entries: · the key dataset name matches any sequence of characters Example: .*, tos, /dataset/main/path, file/path, grid, nc_var_name, suffix, var_min, var_max Specific entries: · both key values are specified Example: ecmwf, tos, /dataset/main/path, file/path, grid, nc_var_name, suffix, var_min, var_max
If there is more than one entry per group that match a given key pair, these will be applied in the order of appearance in the configuration file (top to bottom).
An asterisk (*) in any value element will be interpreted as 'leave it as is or take the default value if yet not defined'. The default values are defined in the following reserved variables: $DEFAULT_EXP_MAIN_PATH$, $DEFAULT_EXP_FILE_PATH$, $DEFAULT_GRID$, $DEFAULT_NC_VAR_NAME$, $DEFAULT_OBS_MAIN_PATH$, $DEFAULT_OBS_FILE_PATH$, $DEFAULT_SUFFIX$, $DEFAULT_VAR_MIN$, $DEFAULT_VAR_MAX$, $DEFAULT_DIMNAME_LATITUDES$, $DEFAULT_DIMNAME_LONGITUDES$ Trailing asterisks in an entry are not mandatory. For example ecmwf, .*, /dataset/main/path, *, *, *, *, *, * will have the same effect as ecmwf, .*, /dataset/main/path
A double quote only (") in any key or value element will be interpreted as 'fill in with the same value as the entry above'.
The pattern to obtain the full path by concatenating the dataset main path and the dataset file path can be adjusted in the variables EXP_FULL_PATH and OBS_FULL_PATH, which are defined by default as EXP_FULL_PATH = $EXP_MAIN_PATH$/$EXP_FILE_PATH$ OBS_FULL_PATH = $OBS_MAIN_PATH$/$OBS_FILE_PATH$

Details

ConfigFileOpen() loads all the data contained in the configuration file specified as parameter 'file_path'. Returns a configuration object with the variables needed for the configuration file mechanism to work. This function is called from inside the Load() function to load the configuration file specified in 'configfile'. ConfigFileCreate() creates an empty configuration file and saves it to the specified path. It may be opened later with ConfigFileOpen() to be edited. Some default values are set when creating a file with this function, you can check these with ConfigShowDefinitions(). ConfigFileSave() saves a configuration object into a file, which may then be used from Load(). Two examples of configuration files can be found inside the 'inst/config/' folder in the package:

IC3.conf: configuration file used at IC3. Contains location data on several datasets and variables.

template.conf: very simple configuration file intended to be used as pattern when starting from scratch.

References

[1] https://stat.ethz.ch/R-manual/R-devel/library/base/html/regex.html [2] http://tldp.org/LDP/abs/html/globbingref.html

Examples

Run this code

# Create an empty configuration file
config_file <- paste0(tempdir(), "/example.conf")
ConfigFileCreate(config_file, confirm = FALSE)
# Open it into a configuration object
configuration <- ConfigFileOpen(config_file)
# Add an entry at the bottom of 4th level of file-per-startdate experiments 
# table which will associate the experiment "ExampleExperiment2" and variable 
# "ExampleVariable" to some information about its location.
configuration <- ConfigAddEntry(configuration, "experiments", 
                 "file-per-startdate", "last", "ExampleExperiment2", 
                 "ExampleVariable", "/path/to/ExampleExperiment2", 
                 "ExampleVariable/ExampleVariable_$START_DATE$.nc")
# Edit entry to generalize for any variable. Changing variable needs .
configuration <- ConfigEditEntry(configuration, "experiments", 
                 "file-per-startdate", 1, var_name = ".*", 
                 file_path = "$VAR_NAME$/$VAR_NAME$_$START_DATE$.nc")
# Now put variable in list of supported variables. In this example will be a 
# two-dimensional variable.
configuration <- ConfigAddVar(configuration, "2d", "ExampleVariable")
# Change the default grid
configuration <- ConfigEditDefinition(configuration, "DEFAULT_GRID", "r256x128",
                                      confirm = FALSE)
# Now apply matching entries for variable and experiment name and show the 
# result
match_info <- ConfigApplyMatchingEntries(configuration, 'tas', 
              exp = c('ExampleExperiment2'), show_result = TRUE)
# Finally save the configuration file.
ConfigFileSave(configuration, config_file, confirm = FALSE)

Run the code above in your browser using DataLab