krigeProblem-class: Class `"krigeProblem"`

Description

The krigeProblem class provides functionality for kriging using distributed calculations, based on maximum likelihood estimation. The class includes methods for standard kriging calculations and metadata necessary for carrying out the methods in a distributed fashion.

To carry out kriging calculations, one must first initialize an object of the krigeProblem class. This is done using krigeProblem$new and help on initialization can be obtained via krigeProblem$help('initialize') (but noting that the call is krigeProblem$new not krigeProblem$initialize).

Note that in what follows I refer to observation and prediction 'locations'. This is natural for spatial problems, but for non-spatial problems, 'locations' is meant to refer to the points within the relevant domain at which observations are available and predictions wish to be made. The user must provide functions that create the subsets of the mean vector(s) and the covariance matrix/matrices. Functions for the mean vector and covariance matrix for observation locations are required, while those for the mean vector for prediction locations, the cross-covariance matrix (where the first column is the index of the observation locations and the second of the prediction locations), and the prediction covariance matrix for prediction locations are required when doing prediction and posterior simulation. These functions should follow the form of SN2011fe_meanfunc, SN2011fe_predmeanfunc, SN2011fe_covfunc, SN2011fe_predcovfunc, and SN2011fe_crosscovfunc. Namely, they should take three arguments, the first a vector of all the parameters for the Gaussian process (both mean and covariance), the second an arbitrary list of inputs (in general this would include the observation and prediction locations), and the third being indices, which will be provided by the package and will differ between slave processes. For the mean functions, the indices will be a vector, indicating which of the vector elements are stored on a given process. For the covariance functions, the indices will be a two column matrix, with each row a pair of indices (row, column), indicating the elements of the matrix stored on a given process. Thus, the user-provided functions should use the second and third arguments to construct the elements of the vectors/matrices belonging on the slave process. Note that the elements of the matrices are stored as vectors (vectorizing matrices column-wise, as natural for column-major matrices). Users can simply have their functions operate on the rows of the index matrix without worrying about ordering. An optional fourth argument contains cached values that need not be computed at every call to the user-provided function. If the user wants to make use of caching of values to avoid expensive recomputation, the user function should mimic SN2011fe_covfunc. That is, when the user wishes to change the cached values (including on first use of the function), the function should return a two-element list, with the first element being the covariance matrix elements and the second containing whatever object is to be cached. This cached object will be provided to the function on subsequent calls as the fourth argument. Note that one should have all necessary packages required for calculation of the mean vector(s) and covariance matrix/matrices installed on all machines used and the names of these packages should be passed as the packages argument to the krigeProblem initialization.

Help for the various methods of the class can be obtained with krigeProblem$help('methodName') and a list of fields and methods in the class with krigeProblem$help().

In general, n (or n1 and n2) refer to the length or number of rows/columns of vectors and matrices and h (or h1 and h2) to the block replication factor for these vectors and matrices. More details on block replication factors can be found in the references in references; these are set at reasonable values automatically, and for simplicity, one can set them at one, in which case the number of blocks into which the primary covariance matrix is split is $P$, the number of slave processes. Cross-covariance matrices returned to the user will have number of rows equal to the number of observation locations and number of columns to the number of prediction locations. Matrices of realizations will have each realized field as a single column.

Arguments

docType

class

Extends

All reference classes extend and inherit methods from "envRefClass".

References

Paciorek, C.J., B. Lipshitz, W. Zhuo, Prabhat, C.G. Kaufman, and R.C. Thomas. 2015. Parallelizing Gaussian Process Calculations in R. Journal of Statistical Software, 63(10), 1-23. http://www.jstatsoft.org/v63/i10. Paciorek, C.J., B. Lipshitz, W. Zhuo, Prabhat, C.G. Kaufman, and R.C. Thomas. 2013. Parallelizing Gaussian Process Calculations in R. arXiv:1305.4886. http://arxiv.org/abs/1305.4886.

Examples

Run this code

doSmallExample <- TRUE

if(require(fields)) {

if(doSmallExample){
  SN2011fe <- SN2011fe_subset
  SN2011fe_newdata <- SN2011fe_newdata_subset
  SN2011fe_mle <- SN2011fe_mle_subset
  nProc <- 3
} else {
# users should select number of processors based on their system and the
# size of the full example
nProc <- 210 
}


n <- nrow(SN2011fe)
m <- nrow(SN2011fe_newdata)
nu <- 2
inputs <- c(as.list(SN2011fe), as.list(SN2011fe_newdata), nu = nu)

prob <- krigeProblem$new("prob", numProcesses = nProc, n = n, m = m,
 predMeanFunction = SN2011fe_predmeanfunc, crossCovFunction = SN2011fe_crosscovfunc, 
predCovFunction = SN2011fe_predcovfunc, meanFunction = SN2011fe_meanfunc, 
covFunction = SN2011fe_covfunc,  inputs = inputs, params = SN2011fe_mle$par, 
data = SN2011fe$flux, packages = c("fields"))

prob$calcLogDens()

prob$optimizeLogDens(method = "L-BFGS-B", verbose = TRUE,
lower = rep(.Machine$double.eps, length(SN2011fe_initialParams)),
control = list(parscale = SN2011fe_initialParams, maxit = 2))
# the full optimization can take some time; only two iterations are done 
# are specified here; even this is not run as it takes 10s of seconds

prob$setParams(SN2011fe_mle$par)

pred <- prob$predict(ret = TRUE, se.fit = TRUE, verbose = TRUE)
realiz <- prob$simulateRealizations(r = 10, post = TRUE, verbose = TRUE)

show(prob)
}