Imputes missing values via Data Interpolating Empirical Orthogonal Functions (DINEOF).
ImputeEOF(
formula,
max.eof = NULL,
data = NULL,
min.eof = 1,
tol = 0.01,
max.iter = 10000,
validation = NULL,
verbose = interactive()
)
a formula to build the matrix that will be used in the SVD decomposition (see Details)
maximum and minimum number of singular values used for imputation
a data.frame
tolerance used for determining convergence
maximum iterations allowed for the algorithm
number of points to use in cross-validation (defaults to the maximum of 30 or 10% of the non NA points)
logical indicating whether to print progress
A vector of imputed values with attributes eof
, which is the number of
singular values used in the final imputation; and rmse
, which is the Root
Mean Square Error estimated from cross-validation.
Singular values can be computed over matrices so formula
denotes how
to build a matrix from the data. It is a formula of the form VAR ~ LEFT | RIGHT
(see Formula::Formula) in which VAR is the variable whose values will
populate the matrix, and LEFT represent the variables used to make the rows
and RIGHT, the columns of the matrix.
Think it like "VAR as a function of LEFT and RIGHT".
Alternatively, if value.var
is not NULL
, it's possible to use the
(probably) more familiar data.table::dcast formula interface. In that case,
data
must be provided.
If data
is a matrix, the formula
argument is ignored and the function
returns a matrix.
Beckers, J.-M., Barth, A., and Alvera-Azc<U+00E1>rate, A.: DINEOF reconstruction of clouded images including error maps <U+2013> application to the Sea-Surface Temperature around Corsican Island, Ocean Sci., 2, 183-199, https://doi.org/10.5194/os-2-183-2006, 2006.
# NOT RUN {
library(data.table)
data(geopotential)
geopotential <- copy(geopotential)
geopotential[, gh.t := Anomaly(gh), by = .(lat, lon, month(date))]
# Add gaps to field
geopotential[, gh.gap := gh.t]
set.seed(42)
geopotential[sample(1:.N, .N*0.3), gh.gap := NA]
max.eof <- 5 # change to a higher value
geopotential[, gh.impute := ImputeEOF(gh.gap ~ lat + lon | date, max.eof,
verbose = TRUE, max.iter = 2000)]
library(ggplot2)
ggplot(geopotential[date == date[1]], aes(lon, lat)) +
geom_contour(aes(z = gh.t), color = "black") +
geom_contour(aes(z = gh.impute))
# Scatterplot with a sample.
na.sample <- geopotential[is.na(gh.gap)][sample(1:.N, .N*0.1)]
ggplot(na.sample, aes(gh.t, gh.impute)) +
geom_point()
# Estimated RMSE
attr(geopotential$gh.impute, "rmse")
# Real RMSE
geopotential[is.na(gh.gap), sqrt(mean((gh.t - gh.impute)^2))]
# }
Run the code above in your browser using DataLab