Gapfill: Main Function for Gap-Filling

Description

The function fills (predicts) missing values in satellite data. We illustrate it with MODIS NDVI data, but it can also be applied to other data, that is recorded at equally spaced points in time. Moreover, the function provides infrastructure for the development of new gap-fill algorithms. The predictions of the missing values are based on a subset-predict procedure, i.e., each missing value is predicted separately by (1) selcting a subset of the data to a neighborhood around the missing value and (2) predicting the values based on that subset.

Usage

Gapfill(data, fnSubset = Subset, fnPredict = Predict, iMax = Inf, nPredict = 1L, subset = "missings", clipRange = c(-Inf, Inf), dopar = FALSE, verbose = TRUE, ...)

Arguments

data

Numeric array with four dimensions. The input (satellite) data to be gap-filled. Missing values should be encoded as NA. When using the default Subset and Predict functions, the data should have the dimensions: x coordinate, y coordinate, seasonal index (e.g., day of the year), and year. See the ndvi dataset for an example.

fnSubset

Function to subset the data around a missing value. See Subset and Extend for more information.

fnPredict

Function to predict a missing value based on the return value of fnSubset. See Predict and Extend for more information.

iMax

Integer vector of length 1. The maximum number of iterations until NA is returned as predicted value.

nPredict

Integer vector of length 1. Specifies the length of the vector returned from fnPredict. Values larger than 1 may increase memory usage considerably.

subset

If "missing", all missing values in data are filled. If a logical array of the same dimensions as data or a vector with positive integers, only the missing elements of data[subset] are filled. Note that this does not the same effect as selecting a subset of the input data, since independent of the specified subset, all values in data are used to inform the predictions.

clipRange

Numeric vector of length 2. Specifies the lower and the upper bound of the filled data. Values outside this range are clipped accordingly. If nPredict is larger than 2, only the first return value of fnPredict will be clipped.

dopar

Logical vector of length 1. If TRUE, the %dopar% construct from the R package foreach is used. This allows the function to predict several missing values in parallel, if a parallel back-end (e.g., from the R package doParallel or doMpi) is available. See the example below and foreach for more information.

verbose

Logical vector of length 1. If TRUE, messages are printed.

...

Additional arguments passed to fnSubset and fnPredict.

Value

List of length 4 with the entries:

fill contains the gap-filled data. if nPredict = 1, fill is an array of dimension dim(data), otherwise the array is of dimension c(dim(data), nPredict).
mps integer vector of length equaling the number of predicted values. Contains the (1 dimensional) indices of the predicted values.
time list of length 4 containing timing information.
- end end date and time.
- elapsedMins elapsed minutes.
- elapsedSecsPerNA elapsed seconds per predicted value.
call call used to produce the object.

Details

The predictions of the missing values are based on a subset-predict procedure, i.e., each missing value is predicted separately by (1) selecting a subset of the data to a neighborhood around it and (2) predicting the values based on that subset. The following gives more information on this subset-predict strategy. Missing values are often unevenly distributed in data. Therefore, the size of a reasonable subset may be different depending on the position of the considered missing value. The search strategy to find that subset is encoded in fnSubset. The function returns different subsets depending on the argument i. The decision whether or not a subset is suitable and the prediction itself is implemented in fnPredict. To be more specific, the subset-predict procedure loops over the following two steps to predict one missing value:

(1): The function fnSubset is provided with the argument i = i (where i <- 0 in the first iteration) and returns a subset around the missing value.
(2): The function fnPredict decides whether the subset contains enough information to predict the missing value. If so, the predicted value is returned. Otherwise, the function returns NA and the algorithm increases i by one (i <- i + 1) before continuing with step (1).

The procedure stops if one of the following criteria is met:

fnPredict returns a non-NA value,
iMax tries have be completed,
fnSubset returns the same subset two times in a row.

References

F. Gerber, R. Furrer, G. Schaepman-Strub, R. de Jong, M. E. Schaepman, 2016, Predicting missing values in spatio-temporal satellite data. http://arxiv.org/abs/1605.01038.

Examples

Run this code

## Not run: 
# out <- Gapfill(ndvi, clipRange = c(0, 1))
# 
# ## look at input and output
# str(ndvi)
# str(out)
# Image(ndvi)
# Image(out$fill)
# 
# ## run on 2 cores in parallel
# if(require(doParallel)){
#   registerDoParallel(2)
#   out <- Gapfill(ndvi, dopar = TRUE)
# }
# 
# ## return also the prediction interval
# out <- Gapfill(ndvi, nPredict = 3, predictionInterval = TRUE)
# 
# ## dimension has changed according to 'nPredict = 3'
# dim(out$fill)
# 
# ## clip values outside the valid parameter space [0,1].
# out$fill[out$fill < 0] <- 0
# out$fill[out$fill > 1] <- 1
# 
# ## images of the output:
# ## predicted NDVI
# Image(out$fill[,,,,1])
# ## lower bound of the prediction interval
# Image(out$fill[,,,,2])
# ## upper bound of the prediction interval
# Image(out$fill[,,,,3])
# ## prediction interval length
# Image(out$fill[,,,,3] - out$fill[,,,,2])
# 
# ## End(Not run)

Run the code above in your browser using DataLab