Bayesian Optimization

Flexible Bayesian optimization of model hyperparameters.

BayesianOptimization(FUN, bounds, saveIntermediate = NULL,
  leftOff = NULL, parallel = FALSE, packages = NULL, export = NULL,
  initialize = TRUE, initGrid = NULL, initPoints = 0, bulkNew = 1,
  nIters = 0, kern = "Matern52", beta = 0, acq = "ucb",
  stopImpatient = list(newAcq = "ucb", rounds = Inf), kappa = 2.576,
  eps = 0, gsPoints = 100, convThresh = 1e+07,
  minClusterUtility = NULL, noiseAdd = 0.25, verbose = 1)

the function to be maximized. This function should return a named list with at least 1 component. The first component must be named Score and should contain the metric to be maximized. You may return other named scalar elements that you wish to include in the final summary table.


named list of lower and upper bounds for each hyperparameter. The names of the list should be arguments passed to FUN. Use "L" suffix to indicate integer hyperparameters.


character filepath (including file name) that specifies the location to save intermediary results. This will save a data.table as an RDS that can be specified as the leftOff parameter.


data.table containing parameter-Score pairs. If supplied, the process will rbind this table to the parameter-Score pairs obtained through initialization. This table should be obtained from the file saved by saveIntermediate.


should the process run in parallel? If TRUE, several criteria must be met:

  • A parallel backend must be registered

  • FUN must be executable using only packages specified in packages (and base packages)

  • FUN must be executable using only the the objects specified in export

  • The function must be thread safe.


character vector of the packages needed to run FUN.


character vector of object names needed to evaluate FUN.


should the process initialize a parameter-Score pair set? If FALSE, leftOff must be provided.


user specified points to sample the target function, should be a data.frame or data.table with identical column names as bounds.


number of randomly chosen points to sample the scoring function before Bayesian Optimization fitting the Gaussian Process.


integer that specifies the number of parameter combinations to try between each Gaussian process fit.


total number of parameter sets to be sampled, including initial set.


a character that gets mapped to one of GauPro's GauPro_kernel_beta S6 classes. Determines the covariance function used in the gaussian process. Can be one of:

  • "Gaussian"

  • "Exponential"

  • "Matern52"

  • "Matern32"


the kernel lengthscale parameter log10(theta). Passed to GauPro_kernel_beta specified in kern.


acquisition function type to be used. Can be "ucb", "ei", "eips" or "poi".

  • ucb Upper Confidence Bound

  • ei Expected Improvement

  • eips Expected Improvement Per Second

  • poi Probability of Improvement


a list containing rounds and newAcq, if acq = "eips" you can switch the acquisition function to newAcq after rounds parameter-score pairs are found.


tunable parameter kappa of GP Upper Confidence Bound, to balance exploitation against exploration, increasing kappa will incentivise exploration.


tunable parameter epsilon of ei, eips and poi. Balances exploitation against exploration. Increasing eps will make the "improvement" threshold higher.


integer that specifies how many initial points to try when searching for the optimal parameter set. Increase this for a higher chance to find global optimum, at the expense of more time.


convergence threshold passed to factr when the optim function (L-BFGS-B) is called. Lower values will take longer to converge, but may be more accurate.


number 0-1. Represents the minimum percentage of the optimal utility required for a less optimal local maximum to be included as a candidate parameter set in the next scoring function. If NULL, only the global optimum will be used as a candidate parameter set.


if bulkNew > 1, specifies how much noise to add to the optimal candidate parameter set to obtain the other bulkNew-1 candidate parameter sets. New random draws are pulled from a shape(4,4) beta distribution centered at the optimal candidate parameter set with a range equal to noiseAdd*(Upper Bound - Lower Bound)


Whether or not to print progress. If 0, nothing will be printed. If 1, progress will be printed. If 2, progress and information about new parameter-score pairs will be printed.


A list containing details about the process:


The list of the gaussian process objects that were fit.


The optimal parameters according to each gaussian process


A list of all parameter-score pairs, as well as extra columns from FUN


The best parameter set at each iteration


Jasper Snoek, Hugo Larochelle, Ryan P. Adams (2012) Practical Bayesian Optimization of Machine Learning Algorithms

  • BayesianOptimization
# Example 1 - Optimization of a Linear Function
scoringFunction <- function(x) {
  a <- exp(-(2-x)^2)*1.5
  b <- exp(-(4-x)^2)*2
  c <- exp(-(6-x)^2)*1
  return(list(Score = a+b+c))

bounds <- list(x = c(0,8))

Results <- BayesianOptimization(
    FUN = scoringFunction
  , bounds = bounds
  , initPoints = 5
  , nIters = 8
  , gsPoints = 10

# }
# Example 2 - Hyperparameter Tuning in xgboost

data(agaricus.train, package = "xgboost")

Folds <- list( Fold1 = as.integer(seq(1,nrow(agaricus.train$data),by = 3))
             , Fold2 = as.integer(seq(2,nrow(agaricus.train$data),by = 3))
             , Fold3 = as.integer(seq(3,nrow(agaricus.train$data),by = 3)))

scoringFunction <- function(max_depth, min_child_weight, subsample) {

  dtrain <- xgb.DMatrix(agaricus.train$data,label = agaricus.train$label)

  Pars <- list( booster = "gbtree"
              , eta = 0.01
              , max_depth = max_depth
              , min_child_weight = min_child_weight
              , subsample = subsample
              , objective = "binary:logistic"
              , eval_metric = "auc")

  xgbcv <- params = Pars
                 , data = dtrain
                 , nround = 100
                 , folds = Folds
                 , prediction = TRUE
                 , showsd = TRUE
                 , early_stopping_rounds = 5
                 , maximize = TRUE
                 , verbose = 0)

  return(list( Score = max(xgbcv$evaluation_log$test_auc_mean)
             , nrounds = xgbcv$best_iteration

bounds <- list(max_depth = c(2L, 10L)
             , min_child_weight = c(1, 100)
             , subsample = c(0.25, 1))

kern <- "Matern52"

acq <- "ei"

ScoreResult <- BayesianOptimization(
    FUN = scoringFunction
  , bounds = bounds
  , initPoints = 10
  , bulkNew = 1
  , nIters = 12
  , kern = kern
  , acq = acq
  , kappa = 2.576
  , verbose = 1
  , parallel = FALSE
  , gsPoints = 50)
# }
Documentation reproduced from package ParBayesianOptimization, version 0.0.1, License: GPL-2

Community examples

Looks like there are no examples yet.