maxNR: Newton- and Quasi-Newton Maximization

Description

Unconstrained and equality-constrained maximization based on the quadratic approximation (Newton) method. The Newton-Raphson, BFGS (Broyden 1970, Fletcher 1970, Goldfarb 1970, Shanno 1970), and BHHH (Berndt, Hall, Hall, Hausman 1974) methods are available.

Usage

maxNR(fn, grad = NULL, hess = NULL, start,
      constraints = NULL, finalHessian = TRUE, bhhhHessian=FALSE,
      fixed = NULL, activePar = NULL, control=NULL, ... )
maxBFGSR(fn, grad = NULL, hess = NULL, start,
      constraints = NULL, finalHessian = TRUE,
      fixed = NULL, activePar = NULL, control=NULL, ... )
maxBHHH(fn, grad = NULL, hess = NULL, start, 
      finalHessian = "BHHH", ... )

Arguments

the function to be maximized. It must have the parameter vector as the first argument and it must return either a single number, or a numeric vector (this is is summed internally). If the BHHH method is used and argument gradient

grad

gradient of the objective function. It must have the parameter vector as the first argument and it must return either a gradient vector of the objective function, or a matrix, where columns correspond to individual parameters. The

hess

Hessian matrix of the function. It must have the parameter vector as the first argument and it must return the Hessian matrix of the objective function. If missing, finite-difference Hessian, based on gradient, is computed.

start

initial parameter values. If start values are named, those names are also carried over to the results.

constraints

either NULL for unconstrained optimization or a list with two components. The components may be either eqA and eqB for equality-constrained optimization $A \theta + B = 0$; or ineqA and

finalHessian

how (and if) to calculate the final Hessian. Either FALSE (do not calculate), TRUE (use analytic/finite-difference Hessian) or "bhhh"/"BHHH" for the information equality approach. The latte

bhhhHessian

logical. Indicating whether to use the information equality approximation (Bernd, Hall, Hall, and Hausman, 1974) for the Hessian. This effectively transforms maxNR into maxBHHH and is mainly designed for internal use

fixed

parameters to be treated as constants at their start values. If present, it is treated as an index vector of start parameters.

activePar

this argument is retained for backward compatibility only; please use argument fixed instead.

control

list of control parameters. The control parameters used by these optimizers are [object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[objec

...

further arguments to fn, grad and hess. Further arguments to maxBHHH are also passed to maxNR. To maintain compatibility with the earlier versions, ...also passes a numbe

Value

list of class "maxim" with following components:
maximumfn value at maximum (the last calculated value if not converged).
estimateestimated parameter value.
gradientvector, last gradient value which was calculated. Should be close to 0 if normal convergence.
gradientObsmatrix of gradients at parameter value estimate evaluated at each observation (only if grad returns a matrix or grad is not specified and fn returns a vector).
hessianHessian at the maximum (the last calculated value if not converged).
code
return code:
- 1
{ gradient close to zero (normal convergence).} 2{ successive function values within tolerance limit (normal convergence).} 3{ last step could not find higher value (probably not converged). This is related to line search step getting too small, usually because hitting the boundary of the parameter space. It may also be related to attempts to move to a wrong direction because of numerical errors. In some cases it can be helped by changing steptol.} 4{ iteration limit exceeded.} 5{ Infinite value.} 6{ Infinite gradient.} 7{ Infinite Hessian.} 8{ Successive function values withing relative tolerance limit (normal convergence).} 9{ (BFGS) Hessian approximation cannot be improved because of gradient did not change. May be related to numerical approximation problems or wrong analytic gradient.} 100{ Initial value out of range.}

item

message
last.step
f0
climb
fixed
iterations
type
constraints
outer.iterations
barrier.value

code

NULL

itemize

type

Warning

No attempt is made to ensure that user-provided analytic gradient/Hessian is correct. The users are encouraged to use compareDerivatives function, designed for this purpose. If analytic gradient/Hessian are wrong, the algorithm may not converge, or may converge to a wrong point.

As the BHHH method uses the likelihood-specific information equality, it is only suitable for maximizing log-likelihood functions!

Quasi-Newton methods, including those mentioned above, do not work well in non-concave regions. This is especially the case with the implementation in maxBFGSR. The user is advised to experiment with various tolerance options to achieve convergence.

Details

The idea of the Newton method is to approximate the function at a given location by a multidimensional quadratic function, and use the estimated maximum as the start value for the next iteration. Such an approximation requires knowledge of both gradient and Hessian, the latter of which can be quite costly to compute. Several methods for approximating Hessian exist, including BFGS and BHHH.

The BHHH (information equality) approximation is only valid for log-likelihood functions. It requires the score (gradient) values by individual observations and hence those must be returned by individual observations by grad or fn. The Hessian is approximated as the negative of the sum of the outer products of the gradients of individual observations, or, in the matrix form, $$\mathsf{H}^{BHHH} = -\frac{1}{N} \sum_{i=1}^N \left[ \frac{\partial \ell(\boldsymbol{\vartheta})} {\boldsymbol{\vartheta}} \frac{\partial \ell(\boldsymbol{\vartheta})} {\boldsymbol{\vartheta}'} \right]$$

The functions maxNR, maxBFGSR, and maxBHHH can work with constant parameters, useful if a parameter value converges to the boundary of support, or for testing. One way is to put fixed to non-NULL, specifying which parameters should be treated as constants. The parameters can also be fixed in runtime (only for maxNR and maxBHHH) by signaling it with the fn return value. See Henningsen & Toomet (2011) for details.

References

Berndt, E., Hall, B., Hall, R. and Hausman, J. (1974): Estimation and Inference in Nonlinear Structural Models, Annals of Social Measurement 3, 653--665.

Broyden, C.G. (1970): The Convergence of a Class of Double-rank Minimization Algorithms, Journal of the Institute of Mathematics and Its Applications 6, 76--90.

Fletcher, R. (1970): A New Approach to Variable Metric Algorithms, Computer Journal 13, 317--322.

Goldfeld, S.M. and Quandt, R.E. (1972): Nonlinear Methods in Econometrics. Amsterdam: North-Holland.

Goldfarb, D. (1970): A Family of Variable Metric Updates Derived by Variational Means, Mathematics of Computation 24, 23--26.

Greene, W.H., (2008), Econometric Analysis, 6th edition, Prentice Hall.

Henningsen, A. and Toomet, O. (2011): maxLik: A package for maximum likelihood estimation in R Computational Statistics 26, 443--458 Marquardt, D.W., (1963) An Algorithm for Least-Squares Estimation of Nonlinear Parameters, Journal of the Society for Industrial & Applied Mathematics 11, 2, 431--441

Shanno, D.F. (1970): Conditioning of Quasi-Newton Methods for Function Minimization, Mathematics of Computation 24, 647--656.

Examples

Run this code

## estimate the exponential distribution parameter by ML
t <- rexp(100, 2)
loglik <- function(theta) sum(log(theta) - theta*t)
## Note the log-likelihood and gradient are summed over observations
gradlik <- function(theta) sum(1/theta - t)
hesslik <- function(theta) -100/theta^2
## Estimate with finite-difference gradient and Hessian
a <- maxNR(loglik, start=1, control=list(printLevel=2))
summary(a)
## You would probably prefer 1/mean(t) instead ;-)
## Estimate with analytic gradient and Hessian
a <- maxNR(loglik, gradlik, hesslik, start=1)
summary(a)

## BFGS estimation with finite-difference gradient
a <- maxBFGSR( loglik, start=1 )
summary(a)

## For the BHHH method we need likelihood values and gradients
## of individual observations
loglikInd <- function(theta) log(theta) - theta*t
gradlikInd <- function(theta) 1/theta - t
## Estimate with analytic gradient
a <- maxBHHH(loglikInd, gradlikInd, start=1)
summary(a)

##
## Example with a vector argument:  Estimate the mean and
## variance of a random normal sample by maximum likelihood
## Note: you might want to use maxLik instead
##
loglik <- function(param) {
  mu <- param[1]
  sigma <- param[2]
  ll <- -0.5*N*log(2*pi) - N*log(sigma) - sum(0.5*(x - mu)^2/sigma^2)
  ll
}
x <- rnorm(100, 1, 2) # use mean=1, stdd=2
N <- length(x)
res <- maxNR(loglik, start=c(0,1)) # use 'wrong' start values
summary(res)
##
## The previous example with named parameters and fixed values
##
resFix <- maxNR(loglik, start=c(mu=0, sigma=1), fixed="sigma")
summary(resFix)  # 'sigma' is exactly 1.000 now.
###
### Constrained optimization
###
## We maximize exp(-x^2 - y^2) where x+y = 1
hatf <- function(theta) {
  x <- theta[1]
  y <- theta[2]
  exp(-(x^2 + y^2))
  ## Note: you may prefer exp(- theta \%*\% theta) instead
}
## use constraints: x + y = 1
A <- matrix(c(1, 1), 1, 2)
B <- -1
res <- maxNR(hatf, start=c(0,0), constraints=list(eqA=A, eqB=B),
             control=list(printLevel=1))
print(summary(res))

Run the code above in your browser using DataLab