GaSP: Building, fitting, predicting for a GaSP model

Description

This function serves as a wrapper to build, fit, and make prediction for a Gaussian process model. It calls on functions gp, gp.mcmc, gp.optim, gp.predict.

Usage

GaSP(
  formula = ~1,
  output,
  input,
  param,
  smooth.est = FALSE,
  input.new = NULL,
  cov.model = list(family = "CH", form = "isotropic"),
  model.fit = "Cauchy_prior",
  prior = list(),
  proposal = list(range = 0.35, tail = 2, nugget = 0.8, nu = 0.8),
  nsample = 5000,
  burnin = 1000,
  opt = NULL,
  bound = NULL,
  dtype = "Euclidean",
  verbose = TRUE
)

Value

a list containing the S4 object gp and prediction results

Arguments

formula

an object of formula class that specifies regressors; see formula for details.

output

a numerical vector including observations or outputs in a GaSP

input

a matrix including inputs in a GaSP

param

a list including values for regression parameters, covariance parameters, and nugget variance parameter. The specification of param should depend on the covariance model.

The regression parameters are denoted by coeff. Default value is $\mathbf{0}$.
The marginal variance or partial sill is denoted by sig2. Default value is 1.
The nugget variance parameter is denoted by nugget for all covariance models. Default value is 0.
For the Confluent Hypergeometric class, range is used to denote the range parameter $\beta$. tail is used to denote the tail decay parameter $\alpha$. nu is used to denote the smoothness parameter $\nu$.
For the generalized Cauchy class, range is used to denote the range parameter $\phi$. tail is used to denote the tail decay parameter $\alpha$. nu is used to denote the smoothness parameter $\nu$.
For the Matérn class, range is used to denote the range parameter $\phi$. nu is used to denote the smoothness parameter $\nu$. When $\nu=0.5$, the Matérn class corresponds to the exponential covariance.
For the powered-exponential class, range is used to denote the range parameter $\phi$. nu is used to denote the smoothness parameter. When $\nu=2$, the powered-exponential class corresponds to the Gaussian covariance.

smooth.est

a logical value indicating whether smoothness parameter will be estimated.

input.new

a matrix of new input locations

cov.model

a list of two strings: family, form, where family indicates the family of covariance functions including the Confluent Hypergeometric class, the Matérn class, the Cauchy class, the powered-exponential class. form indicates the specific form of covariance structures including the isotropic form, tensor form, automatic relevance determination form.

family

CH: The Confluent Hypergeometric correlation function is given by $$C(h) = \frac{\Gamma(\nu+\alpha)}{\Gamma(\nu)} \mathcal{U}\left(\alpha, 1-\nu, \left(\frac{h}{\beta}\right)^2\right),$$ where $\alpha$ is the tail decay parameter. $\beta$ is the range parameter. $\nu$ is the smoothness parameter. $\mathcal{U}(\cdot)$ is the confluent hypergeometric function of the second kind. For details about this covariance, see Ma and Bhadra (2019) at https://arxiv.org/abs/1911.05865.

cauchy

The generalized Cauchy covariance is given by $$C(h) = \left\{ 1 + \left( \frac{h}{\phi} \right)^{\nu} \right\}^{-\alpha/\nu},$$ where $\phi$ is the range parameter. $\alpha$ is the tail decay parameter. $\nu$ is the smoothness parameter with default value at 2.

matern

The Matérn correlation function is given by $$C(h)=\frac{2^{1-\nu}}{\Gamma(\nu)} \left( \frac{h}{\phi} \right)^{\nu} \mathcal{K}_{\nu}\left( \frac{h}{\phi} \right),$$ where $\phi$ is the range parameter. $\nu$ is the smoothness parameter. $\mathcal{K}_{\nu}(\cdot)$ is the modified Bessel function of the second kind of order $\nu$.

exp

The exponential correlation function is given by $$C(h)=\exp(-h/\phi),$$ where $\phi$ is the range parameter. This is the Matérn correlation with $\nu=0.5$.

matern_3_2

The Matérn correlation with $\nu=1.5$.

matern_5_2

The Matérn correlation with $\nu=2.5$.

powexp

The powered-exponential correlation function is given by $$C(h)=\exp\left\{-\left(\frac{h}{\phi}\right)^{\nu}\right\},$$ where $\phi$ is the range parameter. $\nu$ is the smoothness parameter.

gauss

The Gaussian correlation function is given by $$C(h)=\exp\left(-\frac{h^2}{\phi^2}\right),$$ where $\phi$ is the range parameter.

form

isotropic: This indicates the isotropic form of covariance functions. That is, $$C(\mathbf{h}) = C^0(\|\mathbf{h}\|; \boldsymbol \theta),$$ where $\| \mathbf{h}\|$ denotes the Euclidean distance or the great circle distance for data on sphere. $C^0(\cdot)$ denotes any isotropic covariance family specified in family.

tensor

This indicates the tensor product of correlation functions. That is, $$ C(\mathbf{h}) = \prod_{i=1}^d C^0(|h_i|; \boldsymbol \theta_i),$$ where $d$ is the dimension of input space. $h_i$ is the distance along the $i$th input dimension. This type of covariance structure has been often used in Gaussian process emulation for computer experiments.

ARD

This indicates the automatic relevance determination form. That is, $$C(\mathbf{h}) = C^0\left(\sqrt{\sum_{i=1}^d\frac{h_i^2}{\phi^2_i}}; \boldsymbol \theta \right),$$ where $\phi_i$ denotes the range parameter along the $i$th input dimension.

model.fit

a string indicating the choice of priors on correlation parameters:

Cauchy_prior: This indicates that a fully Bayesian approach with objective priors is used for parameter estimation, where location-scale parameters are assigned with constant priors and correlation parameters are assigned with half-Cauchy priors (default).

Ref_prior

This indicates that a fully Bayesian approach with objective priors is used for parameter estimation, where location-scale parameters are assigned with constant priors and correlation parameters are assigned with reference priors. This is only supported for isotropic covariance functions. For details, see gp.mcmc.

Beta_prior

This indicates that a fully Bayesian approach with subjective priors is used for parameter estimation, where location-scale parameters are assigned with constant priors and correlation parameters are assigned with beta priors parameterized as $Beta(a, b, lb, ub)$. In the beta distribution, lb and ub are the support for correlation parameters, and they should be determined based on domain knowledge. a and b are two shape parameters with default values at 1, corresponding to the uniform prior over the support $(lb, ub)$.

MPLE

This indicates that the maximum profile likelihood estimation (MPLE) is used.

MMLE

This indicates that the maximum marginal likelihood estimation (MMLE) is used.

MAP

This indicates that the marginal/integrated posterior is maximized.

prior

a list containing tuning parameters in prior distribution. This is used only if a subjective Bayes estimation method with informative priors is used.

proposal

a list containing tuning parameters in proposal distribution. This is used only if a Bayes estimation method is used.

nsample

an integer indicating the number of MCMC samples.

burnin

an integer indicating the burn-in period.

opt

a list of arguments to setup the optim routine. Current implementation uses three arguments:

method: The optimization method: Nelder-Mead or L-BFGS-B.

lower

The lower bound for parameters.

upper

The upper bound for parameters.

bound

Default value is NULL. Otherwise, it should be a list containing the following elements depending on the covariance class:

nugget: a list of bounds for the nugget parameter. It is a list containing lower bound lb and upper bound ub with default value list(lb=0, ub=Inf).

range

a list of bounds for the range parameter. It has default value range=list(lb=0, ub=Inf) for the Confluent Hypergeometric covariance, the Matérn covariance, exponential covariance, Gaussian covariance, powered-exponential covariance, and Cauchy covariance. The log of range parameterization is used: $\log(\phi)$.

tail

a list of bounds for the tail decay parameter. It has default value list(lb=0, ub=Inf)

for the Confluent Hypergeometric covariance and the Cauchy covariance.

a list of bounds for the smoothness parameter. It has default value list(lb=0, ub=Inf) for the Confluent Hypergeometric covariance and the Matérn covariance. when the powered-exponential or Cauchy class is used, it has default value nu=list(lb=0, ub=2). This can be achieved by specifying the lower bound in opt.

dtype

a string indicating the type of distance:

Euclidean: Euclidean distance is used. This is the default choice.

GCD

Great circle distance is used for data on sphere.

verbose

a logical value. If it is TRUE, the MCMC progress bar is shown.

Author

Pulong Ma mpulong@gmail.com

Examples

Run this code


code = function(x){
y = (sin(pi*x/5) + 0.2*cos(4*pi*x/5))*(x<=9.6) + (x/10-1)*(x>9.6) 
return(y)
}
n=100
input = seq(0, 20, length=n)
XX = seq(0, 20, length=99)
Ztrue = code(input)
set.seed(1234)
output = Ztrue + rnorm(length(Ztrue), sd=0.1)

# fitting a GaSP model with the objective Bayes approach
fit = GaSP(formula=~1, output, input,  
          param=list(range=3, nugget=0.1, nu=2.5), 
          smooth.est=FALSE, input.new=XX,
          cov.model=list(family="matern", form="isotropic"),
          proposal=list(range=.35, nugget=.8, nu=0.8),
          dtype="Euclidean", model.fit="Cauchy_prior", nsample=50, 
          burnin=10, verbose=TRUE)