optim.auglag: Optimize an objective function under multiple blackbox constraints

Description

Uses a surrogate modeled augmented Lagrangian (AL) system to optimize an objective function (blackbox or known and linear) under unknown multiple (blackbox) constraints via expected improvement (EI) and variations; a comparator based on EI with constraints is also provided

Usage

optim.auglag(fn, B, fhat = FALSE, equal = FALSE, ethresh = 1e-2, 
  slack = FALSE, cknown = NULL, start = 10, end = 100, 
  Xstart = NULL, sep = TRUE, ab = c(3/2, 8), lambda = 1, rho = NULL, 
  urate = 10, ncandf = function(t) { t }, dg.start = c(0.1, 1e-06), 
  dlim = sqrt(ncol(B)) * c(1/100, 10), Bscale = 1, ey.tol = 1e-2, 
  N = 1000, plotprog = FALSE, verb = 2, ...)
optim.efi(fn, B, fhat = FALSE, cknown = NULL, start = 10, end = 100, 
  Xstart = NULL, sep = TRUE, ab = c(3/2,8), urate = 10, 
  ncandf = function(t) { t }, dg.start = c(0.1, 1e-6), 
  dlim = sqrt(ncol(B))*c(1/100,10), Bscale = 1, plotprog = FALSE, 
  verb = 2, ...)

Arguments

function of an input (x), facilitating vectorization on a matrix X thereof, returning a list with elements $obj containing the (scalar) objective value and $c containing a vector of evaluations of the (multiple) constraint function at x. The fn function must take a known.only argument which is explained in the note below; it need not act on that argument

2-column matrix describing the bounding box. The number of rows of the matrix determines the input dimension (length(x) in fn(x)); the first column gives lower bounds and the second gives upper bounds

fhat

a scalar logical indicating if the objective function should be modeled with a GP surrogate. The default of FALSE assumes a known linear objective scaled by Bscale. Using TRUE is an “alpha” feature at this time

equal

an optional vector containing zeros and ones, whose length equals the number of constraints, specifying which should be treated as equality constraints (0) and which as inequality (1)

ethresh

a threshold used for equality constraints to determine validity for progress measures; ignored if there are no equality constraints

slack

A scalar logical indicating if slack variables, and thus exact EI calculations should be used. The default of slack = FALSE results in Monte Carlo EI approximation. One can optionally specify slack = 2 to get the slack = TRUE behavior, with a second-stage L-BFGS-B optimization of the EI acquisition applied at the end, starting from the best value found on the random search grid

cknown

A optional positive integer vector specifying which of the constraint values returned by fn should be treated as “known”, i.e., not modeled with Gaussian processes

start

positive integer giving the number of random starting locations before sequential design (for optimization) is performed; start >= 6 is recommended unless Xstart is non-NULL; in the current version the starting locations come from a space-filling design via dopt.gp

end

positive integer giving the total number of evaluations/trials in the optimization; must have end > start

Xstart

optional matrix of starting design locations in lieu of, or in addition to, start random ones; we recommend nrow(Xstart) + start >= 6; also must have ncol(Xstart) = nrow(B)

sep

The default sep = TRUE uses separable GPs (i.e., via newGPsep, etc.) to model the constraints and objective; otherwise the isotropic GPs are used

prior parameters; see darg describing the prior used on the lengthscale parameter during emulation(s) for the constraints

lambda

m-dimensional initial Lagrange multiplier parameter for m-constraints

rho

positive scalar initial quadratic penalty parameter in the augmented Lagrangian; the default setting of rho = NULL causes an automatic starting value to be chosen; see rejoinder to Gramacy, et al. (2016) or supplementary material to Picheny, et al. (2016)

urate

positive integer indicating how many optimization trials should pass before each MLE/MAP update is performed for GP correlation lengthscale parameter(s)

ncandf

function taking a single integer indicating the optimization trial number t, where start < t <= end, and returning the number of search candidates (e.g., for expected improvement calculations) at round t; the default setting allows the number of candidates to grow linearly with t

dg.start

2-vector giving starting values for the lengthscale and nugget parameters of the GP surrogate model(s) for constraints

dlim

2-vector giving bounds for the lengthscale parameter(s) under MLE/MAP inference, thereby augmenting the prior specification in ab

Bscale

scalar indicating the relationship between the sum of the inputs, sum(x), to fn and the output fn(x)$obj; note that at this time only linear objectives are fully supported by the code - more details below

ey.tol

a scalar proportion indicating how many of the EIs at ncandf(t) candidate locations must be non-zero to “trust” that metric to guide search, reducing to an EY-based search instead [choosing that proportion to be one forces EY-based search]

positive scalar integer indicating the number of Monte Carlo samples to be used for calculating EI and EY

plotprog

logical indicating if progress plots should be made after each inner iteration; the plots show three panels tracking the best valid objective, the EI or EY surface over the first two input variables (requires interp, and the parameters of the lengthscale(s) of the GP(s) respectively. When plotprog = TRUE the interp.loess function is used to aid in creating surface plots, however this does not work well with fewer than fifteen points. You may also provide a function as an argument, having similar arguments/formals as interp.loess. For example, we use interp below, which would have been the default if not for licensing incompatibilities

verb

a non-negative integer indicating the verbosity level; the larger the value the more that is printed to the screen

...

additional arguments passed to fn

Value

The output is a list summarizing the progress of the evaluations of the blackbox under optimization

prog

vector giving the best valid (c(x) < 0) value of the objective over the trials

obj

vector giving the value of the objective for the input under consideration at each trial

matrix giving the input values at which the blackbox function was evaluated

matrix giving the value of the constraint function for the input under consideration at each trial

matrix of lengthscale values obtained at the final update of the GP emulator for each constraint

if fhat = TRUE then this is a matrix of lengthscale values for the objective obtained at the final update of the GP emulator

lambda

a matrix containing lambda vectors used in each “outer loop” AL iteration

rho

a vector of rho values used in each “outer loop” AL iteration

Details

These subroutines support a suite of methods used to optimize challenging constrained problems from Gramacy, et al. (2016); and from Picheny, et al., (2016) see references below.

Those schemes hybridize Gaussian process based surrogate modeling and expected improvement (EI; Jones, et., al, 2008) with the additive penalty method (APM) implemented by the augmented Lagrangian (AL, e.g., Nocedal & Wright, 2006). The goal is to minimize a (possibly known) linear objective function f(x) under multiple, unknown (blackbox) constraint functions satisfying c(x) <= 0, which is vector-valued. The solution here emulates the components of c with Gaussian process surrogates, and guides optimization by searching the posterior mean surface, or the EI of, the following composite objective $$ Y(x) = f(x) + \lambda^\top Y_c(x) + \frac{1}{2\rho} \sum_{i=1}^m \max(0, Y_{c_i}(x))^2, $$ where $\lambda$ and $\rho$ follow updating equations that guarantee convergence to a valid solution minimizing the objective. For more details, see Gramacy, et al. (2016).

A slack variable implementation that allows for exact EI calculations and can accommodate equality constraints, and mixed (equality and inequality) constraints, is also provided. For further details, see Picheny, et al. (2016).

The example below illustrates a variation on the toy problem considered in both papers, which bases sampling on EI. For examples making used of equality constraints, following the Picheny, et al (2016) papers; see the demos listed in the “See Also” section below.

Although it is off by default, these functions allow an unknown objective to be modeled (fhat = TRUE), rather than assuming a known linear one. For examples see demo("ALfhat") and demo("GSBP") which illustrate the AL and comparators in inequality and mixed constraints setups, respectively.

The optim.efi function is provided as a comparator. This method uses the same underlying GP models to with the hybrid EI and probability of satisfying the constraints heuristic from Schonlau, et al., (1998). See demo("GSBP") and demo("LAH") for optim.efi examples and comparisons between the original AL, the slack variable enhancement(s) on mixed constraint problems with known and blackbox objectives, respectively

References

Picheny, V., Gramacy, R.B., Wild, S.M., Le Digabel, S. (2016). “Bayesian optimization under mixed constraints with a slack-variable augmented Lagrangian”. Preprint available on arXiv:1605.09466; http://arxiv.org/abs/1605.09466

Gramacy, R.B, Gray, G.A, Lee, H.K.H, Le Digabel, S., Ranjan P., Wells, G., Wild, S.M. (2016) “Modeling an Augmented Lagrangian for Improved Blackbox Constrained Optimization”, Technometrics (with discussion), 58(1), 1-11. Preprint available on arXiv:1403.4890; http://arxiv.org/abs/1403.4890

Jones, D., Schonlau, M., and Welch, W. J. (1998). “Efficient Global Optimization of Expensive Black Box Functions.” Journal of Global Optimization, 13, 455-492.

Schonlau, M., Jones, D.R., and Welch, W. J. (1998). “Global Versus Local Search in Constrained Optimization of Computer Models.” In New Developments and Applications in Experimental Design, vol. 34, 11-25. Institute of Mathematical Statistics.

Nocedal, J. and Wright, S.J. (2006). Numerical Optimization. 2nd ed. Springer.

Examples

Run this code

# NOT RUN {
## this example assumes a known linear objective; further examples
## are in the optim.auglag demo

## a test function returning linear objective evaluations and 
## non-linear constraints
aimprob <- function(X, known.only = FALSE)
{
  if(is.null(nrow(X))) X <- matrix(X, nrow=1)
  f <- rowSums(X)
  if(known.only) return(list(obj=f))
  c1 <- 1.5-X[,1]-2*X[,2]-0.5*sin(2*pi*(X[,1]^2-2*X[,2]))
  c2 <- rowSums(X^2)-1.5
  return(list(obj=f, c=cbind(c1,c2)))
}

## set bounding rectangle for adaptive sampling
B <- matrix(c(rep(0,2),rep(1,2)),ncol=2)

## optimization (primarily) by EI, change 25 to 100 for
## 99% chance of finding the global optimum with value 0.6
## library(akima) ## for plotprog=interp below
out <- optim.auglag(aimprob, B, end=25) ##, plotprog=interp)

## using the slack variable implementation which is a little slower
## but more precise; slack=2 augments random search with L-BFGS-B
  
out2 <- optim.auglag(aimprob, B, end=25, slack=TRUE)
# }
# NOT RUN {
out3 <- optim.auglag(aimprob, B, end=25, slack=2)
# }
# NOT RUN {
## for more slack examples and comparison to optim.efi on problems
## involving equality and mixed (equality and inequality) constraints,
## see demo("ALfhat"), demo("GSBP") and demo("LAH")

## for comparison, here is a version that uses simulated annealing
## with the Additive Penalty Method (APM) for constraints
# }
# NOT RUN {
aimprob.apm <- function(x, B=matrix(c(rep(0,2),rep(1,2)),ncol=2))
{ 
  ## check bounding box
  for(i in 1:length(x)) {
    if(x[i] < B[i,1] || x[i] > B[i,2]) return(Inf)
  }

  ## evaluate objective and constraints
  f <- sum(x)
  c1 <- 1.5-x[1]-2*x[2]-0.5*sin(2*pi*(x[1]^2-2*x[2]))
  c2 <- x[1]^2+x[2]^2-1.5

  ## return APM composite
  return(f + abs(c1) + abs(c2))
}

## use SA; specify control=list(maxit=100), say, to control max 
## number of iterations; does not easily facilitate plotting progress
out4 <- optim(runif(2), aimprob.apm, method="SANN") 
## check the final value, which typically does not satisfy both
## constraints
aimprob(out4$par)
# }
# NOT RUN {
## for a version with a modeled objective see demo("ALfhat")
# }

Run the code above in your browser using DataLab