GP_fit: Gaussian Process Model fitting

Description

For an (n x d) design matrix, X, and the corresponding (n x 1) simulator output Y, this function fits the GP model and returns the parameter estimates. The optimization routine assumes that the inputs are scaled to the unit hypercube \([0,1]^d\).

Usage

GP_fit(X, Y, control = c(200*d, 80*d, 2*d), nug_thres = 20, 
trace = FALSE,maxit = 100,corr = list(type="exponential",power=1.95),
optim_start = NULL)

Arguments

the (n x d) design matrix

the (n x 1) vector of simulator outputs.

control

a vector of parameters used in the search for optimal beta. See `Details'.

nug_thres

a parameter used in computing the nugget. See `Details'.

trace

logical, if TRUE, will provide information on the optim runs

maxit

the maximum number of iterations within optim, defaults to 100

corr

a list of parameters for the specifing the correlation to be used. See corr_matrix.

optim_start

a matrix of potentially likely starting values for correlation hyperparameters for the optim runs, i.e., initial guess of the d-vector beta

Value

an object of class GP containing parameter estimates beta and sig2, nugget parameter delta, the data (X and Y), and a specification of the correlation structure used.

Details

This function fits the following GP model, \(y(x) = \mu + Z(x)\), \(x \in [0,1]^{d}\), where \(Z(x)\) is a GP with mean 0, \(Var(Z(x_i)) = \sigma^2\), and \(Cov(Z(x_i),Z(x_j)) = \sigma^2R_{ij}\). Entries in covariance matrix R are determined by corr and parameterized by beta, a d-vector of parameters. For computational stability \(R^{-1}\) is replaced with \(R_{\delta_{lb}}^{-1}\), where \(R_{\delta{lb}} = R + \delta_{lb}I\) and \(\delta_{lb}\) is the nugget parameter described in Ranjan et al. (2011).

The parameter estimate beta is obtained by minimizing the deviance using a multi-start gradient based search (L-BFGS-B) algorithm. The starting points are selected using the k-means clustering algorithm on a large maximin LHD for values of beta, after discarding beta vectors with high deviance. The control parameter determines the quality of the starting points of the L-BFGS-B algoritm.

control is a vector of three tunable parameters used in the deviance optimization algorithm. The default values correspond to choosing 2*d clusters (using k-means clustering algorithm) based on 80*d best points (smallest deviance, refer to GP_deviance) from a 200*d - point random maximin LHD in beta. One can change these values to balance the trade-off between computational cost and robustness of likelihood optimization (or prediction accuracy). For details see MacDonald et al. (2015).

The nug_thres parameter is outlined in Ranjan et al. (2011) and is used in finding the lower bound on the nugget (\(\delta_{lb}\)).

References

MacDonald, K.B., Ranjan, P. and Chipman, H. (2015). GPfit: An R Package for Fitting a Gaussian Process Model to Deterministic Simulator Outputs. Journal of Statistical Software, 64(12), 1-23. http://www.jstatsoft.org/v64/i12/

Ranjan, P., Haynes, R., and Karsten, R. (2011). A Computationally Stable Approach to Gaussian Process Interpolation of Deterministic Computer Simulation Data, Technometrics, 53(4), 366 - 378.

Examples

Run this code

# NOT RUN {
## 1D Example 1
n = 5; d = 1; 
computer_simulator <- function(x){
x = 2*x+0.5;
y = sin(10*pi*x)/(2*x) + (x-1)^4;
return(y)
}
set.seed(3);
library(lhs);
x = maximinLHS(n,d);
y = computer_simulator(x);
GPmodel = GP_fit(x,y);
print(GPmodel)


## 1D Example 2
n = 7; d = 1;
computer_simulator <- function(x) {
y <- log(x+0.1)+sin(5*pi*x);
return(y)
}
set.seed(1);
library(lhs);
x = maximinLHS(n,d);
y = computer_simulator(x);
GPmodel = GP_fit(x,y);
print(GPmodel, digits = 4)


## 2D Example: GoldPrice Function
computer_simulator <- function(x) {
x1=4*x[,1] - 2; x2=4*x[,2] - 2;
t1 = 1 + (x1 + x2 + 1)^2*(19 - 14*x1 + 3*x1^2 - 14*x2 + 
6*x1*x2 + 3*x2^2);
t2 = 30 + (2*x1 -3*x2)^2*(18 - 32*x1 + 12*x1^2 + 48*x2 - 
36*x1*x2 + 27*x2^2);
y = t1*t2;
return(y)
}
n = 30; d = 2;
set.seed(1);
library(lhs);
x = maximinLHS(n,d); 
y = computer_simulator(x);
GPmodel = GP_fit(x,y);
print(GPmodel)
# }

Run the code above in your browser using DataLab