veriApply: Apply Verification Metrics to Large Datasets

Description

This wrapper applies verification metrics to arrays of forecast ensembles and verifying observations. Various array-based data formats are supported. Additionally, continuous forecasts (and observations) are transformed to category forecasts using user-defined absolute thresholds or percentiles of the long-term climatology (see details).

Usage

veriApply(
  verifun,
  fcst,
  obs,
  fcst.ref = NULL,
  tdim = length(dim(fcst)) - 1,
  ensdim = length(dim(fcst)),
  prob = NULL,
  threshold = NULL,
  strategy = "none",
  na.rm = FALSE,
  fracmin = 0.8,
  nmin = NULL,
  parallel = FALSE,
  maxncpus = 16,
  ncpus = NULL,
  ...
)

Arguments

verifun: Name of function to compute verification metric (score, skill score)
fcst: array of forecast values (at least 2-dimensional)
obs: array or vector of verifying observations
fcst.ref: array of forecast values for the reference forecast (skill scores only)
tdim: index of dimension with the different forecasts
ensdim: index of dimension with the different ensemble members
prob: probability threshold for category forecasts (see below)
threshold: absolute threshold for category forecasts (see below)
strategy: type of out-of-sample reference forecasts or namelist with arguments as in indRef or list of indices for each forecast instance
na.rm: logical, should incomplete forecasts be used?
fracmin: fraction of forecasts that are not-missing for forecast to be evaluated. Used to determine nmin when is.null(nmin)
nmin: number of forecasts that are not-missing for forecast to be evaluated. If both nmin an d fracmin are set, nmin takes precedence
parallel: logical, should parallel execution of verification be used (see below)?
maxncpus: upper bound for self-selected number of CPUs
ncpus: number of CPUs used in parallel computation, self-selected number of CPUs is used when is.null(ncpus) (the default).
...: additional arguments passed to verifun

List of functions to be called

The selection of verification functions supplied with this package and as part of SpecsVerification can be enquired using ls(pos='package:easyVerification') and ls(pos='package:SpecsVerification') respectively. Please note, however, that only some of the functions provided as part of SpecsVerification can be used with veriApply. Functions that can be used include for example the (fair) ranked probability score EnsRps, FairRps, and its skill score EnsRpss, FairRpss, or the continuous ranked probability score EnsCrps, etc.

Conversion to category forecasts

To automatically convert continuous forecasts into category forecasts, absolute (threshold) or relative thresholds (prob) have to be supplied. For some scores and skill scores (e.g. the ROC area and skill score), a list of categories will be supplied with categories ordered. That is, if prob = 1:2/3 for tercile forecasts, cat1 corresponds to the lower tercile, cat2 to the middle, and cat3 to the upper tercile.

Absolute and relative thresholds can be supplied in various formats. If a vector of thresholds is supplied with the threshold argument, the same threshold is applied to all forecasts (e.g. lead times, spatial locations). If a vector of relative thresholds is supplied using prob, the category boundaries to be applied are computed separately for each space-time location. Relative boundaries specified using prob are computed separately for the observations and forecasts, but jointly for all available ensemble members.

Location specific thresholds can also be supplied. If the thresholds are supplied as a matrix, the number of rows has to correspond to the number of forecast space-time locations (i.e. same length as length(fcst)/prod(dim(fcst)[c(tdim, ensdim)])). Alternatively, but equivalently, the thresholds can also be supplied with the dimensionality corresponding to the obs array with the difference that the forecast dimension in obs contains the category boundaries (absolute or relative) and thus may differ in length.

Out-of-sample reference forecasts

strategy specifies the set-up of the climatological reference forecast for skill scores if no explicit reference forecast is provided. The default is strategy = "none", that is all available observations are used as equiprobable members of a reference forecast. Alternatively, strategy = "crossval" can be used for leave-one-out crossvalidated reference forecasts, or strategy = "forward" for a forward protocol (see indRef).

Alternatively, a list with named parameters corresponding to the input arguments of indRef can be supplied for more fine-grained control over standard cases. Finally, also a list with observation indices to be used for each forecast can be supplied (see generateRef).

Parallel processing

Parallel processing is enabled using the parallel package. Parallel verification is using ncpus FORK clusters or, if ncpus are not specified, one less than the autod-etected number of cores. The maximum number of cores used for parallel processing with auto-detection of the number of available cores can be set with the maxncpus argument.

Progress bars are available for non-parallel computation of the verification metrics. Please note, however, that the progress bar only indicates the time of computation needed for the actual verification metrics, input and output re-arrangement is not included in the progress bar.

Examples

Run this code

tm <- toyarray()
f.me <- veriApply("EnsMe", tm$fcst, tm$obs)

## find more examples and instructions in the vignette
if (FALSE) {
devtools::install_github("MeteoSwiss/easyVerification", build_vignettes = TRUE)
library("easyVerification")
vignette("easyVerification")
}

Run the code above in your browser using DataLab