metR (version 0.6.0)

ImputeEOF: Impute missing values

Description

Imputes missing values via Data Interpolating Empirical Orthogonal Functions (DINEOF).

Usage

ImputeEOF(
  formula,
  max.eof = NULL,
  data = NULL,
  min.eof = 1,
  tol = 0.01,
  max.iter = 10000,
  validation = NULL,
  verbose = interactive()
)

Arguments

formula

a formula to build the matrix that will be used in the SVD decomposition (see Details)

max.eof, min.eof

maximum and minimum number of singular values used for imputation

data

a data.frame

tol

tolerance used for determining convergence

max.iter

maximum iterations allowed for the algorithm

validation

number of points to use in cross-validation (defaults to the maximum of 30 or 10% of the non NA points)

verbose

logical indicating whether to print progress

Value

A vector of imputed values with attributes eof, which is the number of singular values used in the final imputation; and rmse, which is the Root Mean Square Error estimated from cross-validation.

Details

Singular values can be computed over matrices so formula denotes how to build a matrix from the data. It is a formula of the form VAR ~ LEFT | RIGHT (see Formula::Formula) in which VAR is the variable whose values will populate the matrix, and LEFT represent the variables used to make the rows and RIGHT, the columns of the matrix. Think it like "VAR as a function of LEFT and RIGHT".

Alternatively, if value.var is not NULL, it's possible to use the (probably) more familiar data.table::dcast formula interface. In that case, data must be provided.

If data is a matrix, the formula argument is ignored and the function returns a matrix.

References

Beckers, J.-M., Barth, A., and Alvera-Azc<U+00E1>rate, A.: DINEOF reconstruction of clouded images including error maps <U+2013> application to the Sea-Surface Temperature around Corsican Island, Ocean Sci., 2, 183-199, https://doi.org/10.5194/os-2-183-2006, 2006.

Examples

Run this code
# NOT RUN {
library(data.table)
data(geopotential)
geopotential <- copy(geopotential)
geopotential[, gh.t := Anomaly(gh), by = .(lat, lon, month(date))]

# Add gaps to field
geopotential[, gh.gap := gh.t]
set.seed(42)
geopotential[sample(1:.N, .N*0.3), gh.gap := NA]

max.eof <- 5    # change to a higher value
geopotential[, gh.impute := ImputeEOF(gh.gap ~ lat + lon | date, max.eof,
                                      verbose = TRUE, max.iter = 2000)]

library(ggplot2)
ggplot(geopotential[date == date[1]], aes(lon, lat)) +
    geom_contour(aes(z = gh.t), color = "black") +
    geom_contour(aes(z = gh.impute))

# Scatterplot with a sample.
na.sample <- geopotential[is.na(gh.gap)][sample(1:.N, .N*0.1)]
ggplot(na.sample, aes(gh.t, gh.impute)) +
    geom_point()

# Estimated RMSE
attr(geopotential$gh.impute, "rmse")
# Real RMSE
geopotential[is.na(gh.gap), sqrt(mean((gh.t - gh.impute)^2))]


# }

Run the code above in your browser using DataCamp Workspace