rwd_agg: Class to specify how to aggregate RiverWare data

Description

rwd_agg() creates a RiverWare data aggregator (rwd_agg) object, which lets users specify how specific RiverWare slots should be aggregated.

Usage

rwd_agg(x = NULL, rdfs = NULL)

Arguments

A data.frame with required column names and valid entries; see Details and Aggregation Specification sections.

rdfs

A vector of rdf names; see Details and Aggregation Specification sections.

Aggregation Specification

In order to specify how each slot should be aggregated, each column should include specific keywords, which are described below. It is up to the user to specify which rdf file contains each slot. In a general case, the user specifies the slot that is found in a specific rdf file (file). A summary function is applied to a subset period of the slot, and then compared (eval) to a threshold (t_s) and saved as the variable.

file: specifies the rdf file that contains the slot.
slot: the full RiverWare slot name, i.e., "Object.Slot".
period: the period that the slot should be summarized over. This should either be a function name, a full month name (found in month.name), or the keyword "asis".
- function name: Specifying a function name allows for pre-specified or custom functions to group together several months in the period. This package provides the following functions: cy(), wy(), eocy(), and eowy(). cy() indicates the data will be summarized over the calendar year, i.e., January - December, while wy() summarizes over the water year, i.e., October - September. eocy() selects the end of the calendar year, and eowy() selects the end of the water year. When specified in the slot_agg object, leave off the parenthesis, i.e., only specify the function name. If wy() is specified, the function will remove data for any water years that have less than 7 months of data. This "tolerance" is specified by the rwdataplyr.wy_month_tol option, and can be modified by updating this option to another number. For standard monthly data that starts in January and ends in December, this results in keeping the first water year, since it includes 9 months of data, and removing the final water year, since it includes only three months of data. Setting this option to 0 will result in keeping any water year data that has at least one month of data; setting this option to 11, ensures that there must be a full water year of data for that year to be kept.
  
  This can also be a user specified custom function; see the Custom Period Functions section for details on constructing the custom functions.
- full month name: When the full month name is specified, data will be filtered to only include data for that particular month. To select multiple months of data, use a function as described above. If the month specified is not found in month.name, an error will occur.
- asis: If the keyword "asis" is specified, the data is returned for its native timestep, i.e, monthly data will return monthly data and annual data will return annual.
summary: the summary function that should be applied to the period specified as a function name, or NA. If the period specified is "asis" or returns only one month, e.g., eocy(), then the summary should be NA. The summary function should only return one value; for that reason, most of the Summary S4groupGenerics work. Notably, range() will not since it returns two values. There is no reason that a custom function will not work here, but it has not been tested.
eval: the comparison operator to use (see the Compare S4groupGenerics). If no comparison is desired, then NA should be used. If eval is specified the value returned from applying the summary to the period will be compared to the threshold specified by t_s. The results of the comparison are returned as 0 and 1 instead of TRUE and FALSE.
t_s: either the threshold to be compared to if eval is not NA or a value to scale the result by, e.g,. 0.001 to convert from acre-ft to thousand acre-ft. NA can also be specified to not scale the data.
variable: the variable name that will be used to identify the results of applying the period, summary, comparison/scaling to. All variable names should be unique.

For example, to determine if the minimum water year elevation at Lake Powell is below elevation 3550 feet, the following would be specified:

data.frame(
  file = "KeySlots.rdf",
  slot = "Powell.Pool Elevation",
  period = "wy",
  summary = "min",
  eval = "<",
  t_s = 3550,
  variable = "powellLt3550",
  stringsAsFactors = FALSE
)

Custom Period Functions

Users can specify custom period functions to make it easier to group months together in custom ways. For example a function could return all of the summer months, or the more complicated case groups months across different calendar years together. In fact, wy() is an example of a function that does this; another example might be grouping December - February together for winter months.

The custom period function should return a list with three elements:

fun - a function that will modify a rwtbl and properly determine the new Years based on the custom period.
filter_months - the months that should be grouped together.
group_tbl - how to group the returned rwtbl; likely either c("Year") or c("Year", "Month")

See the "RWDataPlyr Workflow" vignette for example implementations of both the summer and winter custom functions described above.

Details

rwd_agg objects can be created in three ways:

By providing a data.frame, with the following expected columns file, slot, period, summary, eval, t_s, and variable. Each row in the data.frame should include all of the information for how each individual slot will be aggregated. See the Aggregation Specification section below for details on how to specify each column.
By providing a vector of rdf files. If specified in this manor, all of the slots in each rdf file will be read in to a rwtbl, but will not be aggregated/summarized.

In this case, the variable names are automatically constructed from the ObjectSlot names. The variable names are constructed as the all lower case version of the object_slot name. If the slot name is "Pool Elevation", it is shortened to "pe", otherwise the full object and slot name are used. If there are any spaces, they are replaced with underscores.
By reading in a csv file with read_rwd_agg(). This csv file must have the correct column names and meet other requirements described below. To ensure it has the correct column names, rwd_agg_template() can be used to create a blank csv file for the user to fill in.

Examples

Run this code

# NOT RUN {
# determine if Powell's minimum water year elevation is < 3550'
rwd_agg(
  data.frame(
    file = "KeySlots.rdf",
    slot = "Powell.Pool Elevation",
    period = "wy",
    summary = "min",
    eval = "<",
    t_s = 3550,
    variable = "powellLt3550",
    stringsAsFactors = FALSE
  )
)

# get all the monthly slots in KeySlots.rdf
rwd_agg(rdfs = "KeySlots.rdf")

# }

Run the code above in your browser using DataLab