qcPrec: Quality control: Identifies and removes suspect data from the original dataset

Description

This function uses original data to estimate new predicted values and compare them with observations. If exist large differences, removes the original values.

Usage

qcPrec(prec, sts, inidate, enddate, parallel = TRUE, ncpu = 2,
printmeta = TRUE, thres = NA)

Value

A new file called cleaned.RData will be created in working directory. The load of this file (load('cleaned.RData')) will add a matrix with the original data filtered by quality control. If printmeta = TRUE, a new meta directory will be created in working path with one file per day. Each file contains a data.frame with many rows as flagged data in that day. The columns show the identifier (ID)of each station; the date; the criteria code through the data was flagged and the removed data. There are five different codes referred to the five criteria: 1 = Suspect data; 2 = Suspect zero; 3 = Suspect outlier; 4 = Suspect wet and 5 = Suspect dry.

Arguments

prec: Object of class matrix containing the original precipitation data. Each column represents one station. The names of columns have to be names of the stations.
sts: Object of class matrix containing the stations info. Must have at least four fields: ID: station identifier; ALT: altitude; X: Longitude in UTM projection (meters); and Y: Latitude in UTM projection (meters). Tabulation separated.
inidate: Object of class Date in format 'YYYY-mm-dd' defining the first day of quality control process
enddate: Object of class Date in format 'YYYY-mm-dd' defining the last day of quality control process
parallel: Logical. When TRUE, parallel computing is activated and the processes will be distributed among the ncpu number of processor cores.
ncpu: Only when parallel = TRUE. Sets the number of processor cores used to parallel computing.
printmeta: When TRUE, one file per day will be written in subdirectory ./meta.
thres: Threshold applied to search nearest stations. If thres=NA the function will search 10 nearest observations without a distance limit. A positive number indicates the threshold in kilometers.

Author

Roberto Serrano-Notivoli

Details

The process of quality control uses five criteria to flag suspect data. All of them are based on the calculation of reference values (RV) made with the 10 nearest observations (NNS) that day. For this reason, a minimum of 11 available data by day is mandatory. The five criteria are : 1) Suspect data: Observed > 0 & all their 10 NNS == 0; 2) Suspect zero: Observed == 0 & all their 10 NNS > 0; 3) Suspect outlier: Observed is 10 times higher or lower than RV; 4) Suspect wet: Observed == 0, wet probability is over 99%, and predicted magnitude is over 5 litres and 5) Suspect dry: Observed > 5 litres, dry probability is over 99%, and predicted magnitude is under 0.1 litres.

All of these criteria are prepared to work with precipitation in tenths (milimetres*10).

Examples

Run this code

  #loads example data
  data(precipDataset)

  #runs function
  qcPrec(prec=preci,sts=sts,inidate=as.Date('2001-01-01'),
    enddate=as.Date('2001-01-02'),parallel=TRUE,ncpu=2,printmeta=TRUE,thres=NA)

Run the code above in your browser using DataLab