optimx: General-purpose optimization

Description

General-purpose optimization wrapper function that calls other R tools for optimization, including the existing optim() function. optimx also tries to unify the calling sequence to allow a number of tools to use the same front-end. These include spg from the BB package, ucminf, nlm, and nlminb. Note that optim() itself allows Nelder--Mead, quasi-Newton and conjugate-gradient algorithms as well as box-constrained optimization via L-BFGS-B. Because SANN does not return a meaningful convergence code (conv), optimx() does not call the SANN method.

Usage

optimx(par, fn, gr=NULL, hess=NULL, lower=-Inf, upper=Inf, 
            method=c("Nelder-Mead","BFGS"), itnmax=NULL, hessian=NULL,
            control=list(),
             ...)

Arguments

par

A vector of initial values for the parameters for which optimal values are to be found.

A function to be minimized (or maximized), with first argument the vector of parameters over which minimization is to take place. It should return a scalar result.

A function to return (as a vector) the gradient for those methods that can use this information. This includes the following methods: "BFGS" "CG" "L-BFGS-B"

If 'gr' is NULL, a finite-di

hess

A function to return (as a symmetric matrix) the Hessian of the objective function for those methods that can use this information.

lower, upper

Bounds on the variables for methods such as "L-BFGS-B" that can handle box (or bounds) constraints.

method

A list of the methods to be used. Note that this is an important change from optim() that allows just one method to be specified. See Details.

itnmax

If provided as a vector of the same length as the list of methods method, gives the maximum number of iterations or function values for the corresponding method. If a single number is provided, this will be used for all methods. Note that

hessian

A logical control that if TRUE forces the computation of an approximation to the Hessian at the final set of parameters. If FALSE, ensures it is not calculated. This setting is provided for compatibility with optim(). See how to set k

control

A list of control parameters. See Details.

...

Further arguments to be passed to fn and gr.

Value

A list with components:
parThe best set of parameters found.
valueThe value of fn corresponding to par.
countsA two-element integer vector giving the number of calls to fn and gr respectively. This excludes those calls needed to compute the Hessian, if requested, and any calls to fn to compute a finite-difference approximation to the gradient.
convergenceAn integer code. 0 indicates successful convergence. Error codes are [object Object],[object Object],[object Object],[object Object]
messageA character string giving any additional information returned by the optimizer, or NULL.
hessianOnly if argument hessian is true. A symmetric matrix giving an estimate of the Hessian at the solution found. Note that this is the Hessian of the unconstrained problem even if the box constraints are active.
More detail is provided in the attribute "details" to the returned answer object. If the returned object from optimx() is ans, this is accessed via the construct attr(ans, "details") This object is a list of lists, one list for each method that has been successful or that has been forced by save.failures=TRUE. To get the details list for the third method, we use attr(ans, "details")[[3]]]]
Each such list has possible elements:
par- the final parameters of the function
value- the final value of the objective function
convergence- a code indicating whether and/or how the method terminated
message- a descriptive message, if one is provided
fevals- the number of times the objective function was evaluated during the run; this item has an attribute "names" that can take on the value "function"
gevals- the number of times the gradient of the objective function was evaluated during the run; this item has an attribute "names" that can take on the value "gradient"
kkt1- an indicator if the first order Kuhn, Karush, Tucker condition for a local minimum is satisfied. Note that we must use a tolerance to make this test, and the indicator is NOT definitive.
kkt2- an indicator if the second order Kuhn, Karush, Tucker condition for a local minimum is satisfied. Note that we must use a tolerance to make this test, and the indicator is NOT definitive.
ngatend- the gradient vector computed or estimated at termination
nhatend- the Hessian matrix computed or estimated at termination
evnhatend- a vector of the eigenvalues of the Hessian at termination
systime- the system time required for execution of the method in question (named in "method" below). This element has the attribute "names".
method- The name of the method used. This has the attribute "CPU times (s)" which itself has an attribute "names" that corresponds to the "method".

encoding

UTF-8

concept

minimization
maximization

source

The code for methods "Nelder-Mead", "BFGS" and "CG" was based originally on Pascal code in Nash (1990) that was translated by p2c and then hand-optimized. Dr Nash has agreed that the code can be made freely available.

The code for method "L-BFGS-B" is based on Fortran code by Zhu, Byrd, Lu-Chen and Nocedal obtained from Netlib (file opt/lbfgs_bcm.shar: another version is in toms/778).

See documentation for "nlm" and "nlminb" for those methods, package "ucminf" for the funciton of the same name, and package BB for method "spg".

Details

Note that arguments after ... must be matched exactly.

By default this function performs minimization, but it will maximize if control$maximize is TRUE. The original optim() function allows control$fnscale to be set negative to accomplish this. DO NOT use both methods.

Possible method codes at the time of writing are 'Nelder-Mead', 'BFGS', 'CG', 'L-BFGS-B', 'nlm', 'nlminb', 'spg', and 'ucminf'.

The default methods are an implementation of the Nelder and Mead (1965) and a Variable Metric method based on the ideas of Fletcher (1970) as modified by him in conversation with Nash (1979). Nelder-Mead uses only function values and is robust but relatively slow. It will work reasonably well for non-differentiable functions. The Variable Metric method, "BFGS" updates an approximation to the inverse Hessian using the BFGS update formulas, along with an acceptable point line search strategy. This method appears to work best with analytic gradients. As at June 2009, a box-constrained version is being tested.

Method "CG" is a conjugate gradients method based on that by Fletcher and Reeves (1964) (but with the option of Polak--Ribiere or Beale--Sorenson updates). The particular implementation is now dated, and improved yet simpler codes are being implemented (as at June 2009), and furthermore a version with box constraints is being tested. Conjugate gradient methods will generally be more fragile than the BFGS method, but as they do not store a matrix they may be successful in much larger optimization problems.

Method "L-BFGS-B" is that of Byrd et. al. (1995) which allows box constraints, that is each variable can be given a lower and/or upper bound. The initial value must satisfy the constraints. This uses a limited-memory modification of the BFGS quasi-Newton method. If non-trivial bounds are supplied, this method will be selected, with a warning.

Nocedal and Wright (1999) is a comprehensive reference for the previous three methods.

Function fn can return NA or Inf if the function cannot be evaluated at the supplied value, but the initial value must have a computable finite value of fn. However, some methods, of which "L-BFGS-B" is known to be a case, require that the values returned should always be finite.

While optim can be used recursively, and for a single parameter as well as many, this may not be true for optimx. optim also accepts a zero-length par, and just evaluates the function with that argument.

Method "nlm" is from the package of the same name that implements ideas of Dennis and Schnabel (1983) and Schnabel et al. (1985). See nlm() for more details.

Method "nlminb" is the package of the same name that uses the minimization tools of the PORT library. The PORT documentation is at . See nlminb() for details. (Though there is very little information about the methods.)

Method "spg" is from package BB implementing a spectral projected gradient method for large-scale optimization with simple constraints due R adaptation, with significant modifications, by Ravi Varadhan, Johns Hopkins University (March 25, 2008), from the original FORTRAN code of Birgin, Martinez, and Raydan (2001).

Method "Rcgmin" is from the package of that name. It implements a conjugate gradient algorithm with the Yuan/Dai update (ref??) and also allows bounds constraints on the parameters. (Rcgmin also allows mask constraints -- fixing individual parameters -- but there is no interface from "optimx".)

Methods "bobyqa", "uobyqa" and "newuoa" are from the package "minqa" which implement optimization by quadratic approximation routines of the similar names due to M J D Powell (2009). See package minqa for details. Note that "uobyqa" and "newuoa" are for unconstrained minimization, while "bobyqa" is for box constrained problems.

The control argument is a list that can supply any of the following components: [object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

The following control elements apply only to some of the methods. The list may be incomplete. See individual packages for details.

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Any names given to par will be copied to the vectors passed to fn and gr. Note that no other attributes of par are copied over. (We have not verified this as at 2009-07-29.)

References

Belisle, C. J. P. (1992) Convergence theorems for a class of simulated annealing algorithms on $R^d$. J Applied Probability, 29, 885--895.

Birgin EG, Martinez JM, and Raydan M (2001): SPG: software for convex-constrained optimization, ACM Transactions on Mathematical Software. ??incomplete reference??

Byrd, R. H., Lu, P., Nocedal, J. and Zhu, C. (1995) A limited memory algorithm for bound constrained optimization. SIAM J. Scientific Computing, 16, 1190--1208.

Dennis, J. E. and Schnabel, R. B. (1983) _Numerical Methods for Unconstrained Optimization and Nonlinear Equations._ Prentice-Hall, Englewood Cliffs, NJ.

Fletcher, R. and Reeves, C. M. (1964) Function minimization by conjugate gradients. Computer Journal 7, 148--154.

Fletcher, R (1970) A new approach to variable metric algorithms, Computer Journal, 13, 317-322.

Nash, J. C. (1979, 1990) Compact Numerical Methods for Computers. Linear Algebra and Function Minimisation. Adam Hilger.

Nelder, J. A. and Mead, R. (1965) A simplex algorithm for function minimization. Computer Journal 7, 308--313.

Nocedal, J. and Wright, S. J. (1999) Numerical Optimization. Springer.

Schnabel, R. B., Koontz, J. E. and Weiss, B. E. (1985) A modular system of algorithms for unconstrained minimization. _ACM Trans. Math. Software_, *11*, 419-440.

Examples

Run this code

require(graphics)

fr <- function(x) {   ## Rosenbrock Banana function
    x1 <- x[1]
    x2 <- x[2]
    100 * (x2 - x1 * x1)^2 + (1 - x1)^2
}
grr <- function(x) { ## Gradient of 'fr'
    x1 <- x[1]
    x2 <- x[2]
    c(-400 * x1 * (x2 - x1 * x1) - 2 * (1 - x1),
       200 *      (x2 - x1 * x1))
}
ans1<-optimx(c(-1.2,1), fr)
print(ans1)
print(attr(ans1,"details"))
cat("")
ans2<-optimx(c(-1.2,1), fr, grr, method = "BFGS")
print(ans2)
## The next line will fail if executed because 'hessian = TRUE' no longer allowed
# ans3<-optimx(c(-1.2,1), fr, NULL, method = "BFGS", hessian = TRUE)
cat("")
ans4<-optimx(c(-1.2,1), fr, grr, method = "CG",control=list(trace=TRUE))
print(ans4)
cat("")
ans5<-optimx(c(-1.2,1), fr, grr, method = "CG", control=list(type=2))
print(ans5)
cat("")
ans6<-optimx(c(-1.2,1), fr, grr, method = "L-BFGS-B")
print(ans6)
cat("")

flb <- function(x)
    { p <- length(x); sum(c(1, rep(4, p-1)) * (x - c(1, x[-p])^2)^2) }
## 25-dimensional box constrained
optimx(rep(3, 25), flb, NULL, method = "L-BFGS-B",
      lower=rep(2, 25), upper=rep(4, 25)) # par[24] is *not* at boundary

## "wild" function , global minimum at about -15.81515
fw <- function (x)
    10*sin(0.3*x)*sin(1.3*x^2) + 0.00001*x^4 + 0.2*x+80
plot(fw, -50, 50, n=1000, main = "optim() minimising 'wild function'")

## Suppressed for optimx() ans7 <- optimx(50, fw, method="SANN",
##             control=list(maxit=20000, temp=20, parscale=20))
## ans7
## Now improve locally {typically only by a small bit}:
## newpar<-unlist(ans7$par) # NOTE: you need to unlist the parameters as optimx() has multiple outputs
##(r2 <- optimx(newpar, fw, method="BFGS"))
##points(r2$par, r2$value, pch = 8, col = "red", cex = 2)

## Show multiple outputs of optimx using all.methods
# genrose function code
genrose.f<- function(x, gs=NULL){ # objective function
## One generalization of the Rosenbrock banana valley function (n parameters)
	n <- length(x)
        if(is.null(gs)) { gs=100.0 }
	fval<-1.0 + sum (gs*(x[1:(n-1)]^2 - x[2:n])^2 + (x[2:n] - 1)^2)
        return(fval)
}

genrose.g <- function(x, gs=NULL){
# vectorized gradient for genrose.f
# Ravi Varadhan 2009-04-03
	n <- length(x)
        if(is.null(gs)) { gs=100.0 }
	gg <- as.vector(rep(0, n))
	tn <- 2:n
	tn1 <- tn - 1
	z1 <- x[tn] - x[tn1]^2
	z2 <- 1 - x[tn]
	gg[tn] <- 2 * (gs * z1 - z2)
	gg[tn1] <- gg[tn1] - 4 * gs * x[tn1] * z1
	return(gg)
}

genrose.h <- function(x, gs=NULL) { ## compute Hessian
   if(is.null(gs)) { gs=100.0 }
	n <- length(x)
	hh<-matrix(rep(0, n*n),n,n)
	for (i in 2:n) {
		z1<-x[i]-x[i-1]*x[i-1]
		z2<-1.0-x[i]
                hh[i,i]<-hh[i,i]+2.0*(gs+1.0)
                hh[i-1,i-1]<-hh[i-1,i-1]-4.0*gs*z1-4.0*gs*x[i-1]*(-2.0*x[i-1])
                hh[i,i-1]<-hh[i,i-1]-4.0*gs*x[i-1]
                hh[i-1,i]<-hh[i-1,i]-4.0*gs*x[i-1]
	}
        return(hh)
}

startx<-4*seq(1:10)/3.
ans8<-optimx(startx,fn=genrose.f,gr=genrose.g, hess=genrose.h, control=list(all.methods=TRUE, save.failures=TRUE), gs=10)
print(ans8)

get.result(ans8, attribute="grs")
get.result(ans8, method="spg")


startx<-4*seq(1:10)/3.
cat("Polyalgorithm with 200 steps NM followed by up to 75 of ucminf
")
ans9<-optimx(startx,fn=genrose.f,gr=genrose.g, hess=genrose.h, method=c("Nelder-Mead","ucminf"),
             itnmax=c(200,75), control=list(follow.on=TRUE, save.failures=TRUE,trace=TRUE), gs=10)
print(ans9)

Run the code above in your browser using DataLab