##### optimisation methods

Functions to set up optimisers (which find parameters that maximise the joint density of a model) and change their tuning parameters, for use in opt(). For details of the algorithms and how to tune them, see the SciPy optimiser docs or the TensorFlow optimiser docs.

##### Usage
nelder_mead()powell()cg()bfgs()newton_cg()l_bfgs_b(maxcor = 10, maxls = 20)tnc(max_cg_it = -1, stepmx = 0, rescale = -1)cobyla(rhobeg = 1)slsqp()gradient_descent(learning_rate = 0.01)adadelta(learning_rate = 0.001, rho = 1, epsilon = 1e-08)adagrad(learning_rate = 0.8, initial_accumulator_value = 0.1)adagrad_da(learning_rate = 0.8, global_step = 1L,
l1_regularization_strength = 0, l2_regularization_strength = 0)momentum(learning_rate = 0.001, momentum = 0.9, use_nesterov = TRUE)adam(learning_rate = 0.1, beta1 = 0.9, beta2 = 0.999,
epsilon = 1e-08)ftrl(learning_rate = 1, learning_rate_power = -0.5,
initial_accumulator_value = 0.1, l1_regularization_strength = 0,
l2_regularization_strength = 0)proximal_gradient_descent(learning_rate = 0.01,
l1_regularization_strength = 0, l2_regularization_strength = 0)proximal_adagrad(learning_rate = 1, initial_accumulator_value = 0.1,
l1_regularization_strength = 0, l2_regularization_strength = 0)rms_prop(learning_rate = 0.1, decay = 0.9, momentum = 0,
epsilon = 1e-10)
##### Arguments
maxcor

maximum number of 'variable metric corrections' used to define the approximation to the hessian matrix

maxls

maximum number of line search steps per iteration

max_cg_it

maximum number of hessian * vector evaluations per iteration

stepmx

maximum step for the line search

rescale

log10 scaling factor used to trigger rescaling of objective

rhobeg

reasonable initial changes to the variables

learning_rate

the size of steps (in parameter space) towards the optimal value

rho

the decay rate

epsilon

a small constant used to condition gradient updates

initial_accumulator_value

initial value of the 'accumulator' used to tune the algorithm

global_step

the current training step number

initial value of the accumulators used to tune the algorithm

l1_regularization_strength

L1 regularisation coefficient (must be 0 or greater)

l2_regularization_strength

L2 regularisation coefficient (must be 0 or greater)

momentum

the momentum of the algorithm

use_nesterov

whether to use Nesterov momentum

beta1

exponential decay rate for the 1st moment estimates

beta2

exponential decay rate for the 2nd moment estimates

learning_rate_power

power on the learning rate, must be 0 or less

decay

discounting factor for the gradient

##### Details

The cobyla() does not provide information about the number of iterations nor convergence, so these elements of the output are set to NA

##### Value

an optimiser object that can be passed to opt.

##### Examples
# NOT RUN {
# use optimisation to find the mean and sd of some data
x <- rnorm(100, -2, 1.2)
mu <- variable()
sd <- variable(lower = 0)
distribution(x) <- normal(mu, sd)
m <- model(mu, sd)

# configure optimisers & parameters via 'optimiser' argument to opt
opt_res <- opt(m, optimiser = bfgs())

# compare results with the analytic solution
opt_res\$par
c(mean(x), sd(x))
# }

