maxNR: Newton- and Quasi-Newton Maximization

Description

Unconstrained maximization based on the quadratic approximation (Newton) method. The Newton-Raphson, BFGS (Broyden 1970, Fletcher 1970, Goldfarb 1970, Shanno 1970), and BHHH (Berndt, Hall, Hall, Hausman 1974) methods are available.

Usage

maxNR(fn, grad = NULL, hess = NULL, start, print.level = 0,
      tol = 1e-08, reltol=sqrt(.Machine$double.eps), gradtol = 1e-06,
      steptol = 1e-10, lambdatol = 1e-06, qrtol = 1e-10, iterlim = 150,
      constraints = NULL, finalHessian = TRUE, bhhhHessian=FALSE,
      fixed = NULL, activePar = NULL, ... )
maxBFGSR(fn, grad = NULL, hess = NULL, start, print.level = 0,
      tol = 1e-8, reltol=sqrt(.Machine$double.eps), gradtol = 1e-6,
      steptol = 1e-10, iterlim = 150,
      constraints = NULL, finalHessian = TRUE,
      fixed = NULL, activePar = NULL, ... )
maxBHHH(fn, grad = NULL, hess = NULL, start, print.level = 0,
      iterlim = 100, finalHessian = "BHHH", ... )

Arguments

function to be maximized. It must have the parameter vector as the first argument and it must return either a single number or a numeric vector, which is summed. If the BHHH method is used and argument gradient is not given, <

grad

gradient of the objective function. It must have the parameter vector as the first argument and it must return either a gradient vector of the objective function, or a matrix, where columns correspond to individual parameters. The

hess

Hessian matrix of the function. It must have the parameter vector as the first argument and it must return the Hessian matrix of the objective function. If missing, finite-difference Hessians, based on gradient, are computed.

start

initial value for the parameter vector.

print.level

this argument determines the level of printing which is done during the minimization process. The default value of 0 means that no printing occurs, a value of 1 means that initial and final details are printed and a value of 2 means that f

tol

stopping condition. Stop if the absolute difference between successive iterations is less than tol, return code=2.

reltol

Relative convergence tolerance. The algorithm stops if it is unable to increase the value by a factor of 'reltol * (abs(val) + reltol)' at a step. Defaults to 'sqrt(.Machine$double.eps)', typically about '1e-8'.

gradtol

stopping condition. Stop if the norm of the gradient less than gradtol, return code=1.

steptol

stopping/error condition. If the quadratic approximation leads to lower function value instead of higher, or NA, the step length is halved and a new attempt is made. This procedure is repeated until step < steptol

lambdatol

control whether the Hessian is treated as negative definite. If the largest of the eigenvalues of the Hessian is larger than -lambdatol, a suitable diagonal matrix is subtracted from the Hessian (quadratic hill-climbing) in o

qrtol

QR-decomposition tolerance

iterlim

stopping condition. Stop if more than iterlim iterations, return code=4.

constraints

either NULL for unconstrained optimization or a list with two components eqA and eqB for equality-constrained optimization $A \theta + B = 0$. The constrained problem is forwarded to

finalHessian

how (and if) to calculate the final Hessian. Either FALSE (do not calculate), TRUE (use analytic/finite-difference Hessian) or "bhhh"/"BHHH" for the information equality approach. The latte

bhhhHessian

logical. Indicating whether the approximation for the Hessian suggested by Bernd, Hall, Hall, and Hausman (1974) should be used.

fixed

parameters that should be fixed at their starting values: either a logical vector of the same length as argument start, a numeric (index) vector indicating the positions of the fixed parameters, or a vector of character stri

activePar

this argument is retained for backward compatibility only; please use argument fixed instead.

...

further arguments to fn, grad and hess. Further arguments to maxBHHH are also passed to maxNR.

Value

list of class "maxim" with following components:
maximumfn value at maximum (the last calculated value if not converged).
estimateestimated parameter value.
gradientvector, last gradient value which was calculated. Should be close to 0 if normal convergence.
gradientObsmatrix of gradients at parameter value estimate evaluated at each observation (only if grad returns a matrix or grad is not specified and fn returns a vector).
hessianHessian at the maximum (the last calculated value if not converged).
code
return code:
- 1
{gradient close to zero (normal convergence).} 2{successive function values within tolerance limit (normal convergence).} 3{last step could not find higher value (probably not converged).} 4{iteration limit exceeded.} 5{Infinite value.} 6{Infinite gradient.} 7{Infinite Hessian.} 100{Initial value out of range.}

code

NULL

item

message
last.step
f0
climb
fixed
iterations
type
constraints
outer.iterations
barrier.value

itemize

type

Warning

No attempt is made to ensure that user-provided analytic gradient/Hessian is the correct one. However, the users are recommended to use compareDerivatives function, designed for this purpose. If analytic gradient/Hessian are wrong, the algorithm may not converge, or converge to a wrong point.

As the BHHH method (maxNR with argument bhhhHessian = TRUE ) or maxBHHH) uses the likelihood-specific information equality, it is only suitable for maximizing log-likelihood functions!

Details

The idea of the Newton method is to approximate the function in a given location with a multidimensional parabola, and use the estimated maximum as the initial value for the next iteration. Such an approximation requires knowledge of both gradient and Hessian, the latter of which can be quite costly to compute. Several methods for approximating Hessian exist, including BFGS and BHHH.

The BHHH method (maxNR with argument bhhhHessian = TRUE ) or maxBHHH) is suitable only for maximizing log-likelihood functions. It uses information equality in order to approximate the Hessian of the log-likelihood function. Hence, the log-likelihood values and its gradients must e calculated by individual observations. The Hessian is approximated as the negative of the sum of the outer products of the gradients of individual observations, or, in the matrix form, -t(gradient) %*% gradient = - crossprod( gradient ).

The functions maxNR, maxBFGSR, and maxBHHH can work with constant parameters and related changes of parameter values. Constant parameters are useful if a parameter value is converging toward the boundary of support, or for testing. One way is to put fixed to non-NULL, specifying which parameters should be treated as constants.

However, when using maxNR or maxBHHH, parameters can also be fixed in runtime by signaling with fn. This may be useful if an estimation converges toward the edge of the parameter space possibly causing problems. The value of fn may have following attributes (only used by maxNR):

constPar

{index vector. Which parameters are redefined to constant} newVal{a list with following components:

index

{which parameters will have a new value} val{the new value of parameters} }

References

Berndt, E., Hall, B., Hall, R. and Hausman, J. (1974): Estimation and Inference in Nonlinear Structural Models, Annals of Social Measurement 3, p. 653-665.

Broyden, C.G. (1970): The Convergence of a Class of Double-rank Minimization Algorithms, Journal of the Institute of Mathematics and Its Applications 6, p. 76-90.

Fletcher, R. (1970): A New Approach to Variable Metric Algorithms, Computer Journal 13, p. 317-322.

Goldfeld, S.M. and Quandt, R.E. (1972): Nonlinear Methods in Econometrics. Amsterdam: North-Holland.

Goldfarb, D. (1970): A Family of Variable Metric Updates Derived by Variational Means, Mathematics of Computation 24, p. 23-26.

Greene, W.H., 2008, Econometric Analysis, 6th edition, Prentice Hall.

Shanno, D.F. (1970): Conditioning of Quasi-Newton Methods for Function Minimization, Mathematics of Computation 24, p. 647-656.

Examples

Run this code

## ML estimation of exponential duration model:
t <- rexp(100, 2)
loglik <- function(theta) sum(log(theta) - theta*t)
## Note the log-likelihood and gradient are summed over observations
gradlik <- function(theta) sum(1/theta - t)
hesslik <- function(theta) -100/theta^2
## Estimate with finite-difference gradient and Hessian
a <- maxNR(loglik, start=1, print.level=2)
summary(a)
## You would probably prefer 1/mean(t) instead ;-)
## Estimate with analytic gradient and Hessian
a <- maxNR(loglik, gradlik, hesslik, start=1)
summary(a)

## BFGS estimation with finite-difference gradient
a <- maxBFGSR( loglik, start=1 )
summary(a)
## BFGS estimation with analytic gradient
a <- maxBFGSR( loglik, gradlik, start=1 )
summary(a)

## For the BHHH method we need likelihood values and gradients
## of individual observations
loglikInd <- function(theta) log(theta) - theta*t
gradlikInd <- function(theta) 1/theta - t
## Estimate with finite-difference gradient
a <- maxBHHH(loglikInd, start=1, print.level=2)
summary(a)
## Estimate with analytic gradient
a <- maxBHHH(loglikInd, gradlikInd, start=1)
summary(a)

##
## Next, we give an example with vector argument:  Estimate the mean and
## variance of a random normal sample by maximum likelihood
## Note: you might want to use maxLik instead
##
loglik <- function(param) {
  mu <- param[1]
  sigma <- param[2]
  ll <- -0.5*N*log(2*pi) - N*log(sigma) - sum(0.5*(x - mu)^2/sigma^2)
  ll
}
x <- rnorm(1000, 1, 2) # use mean=1, stdd=2
N <- length(x)
res <- maxNR(loglik, start=c(0,1)) # use 'wrong' start values
summary(res)
###
### Now an example of constrained optimization
###
## We maximize exp(-x^2 - y^2) where x+y = 1
f <- function(theta) {
  x <- theta[1]
  y <- theta[2]
  exp(-(x^2 + y^2))
  ## Note: you may want to use exp(- theta \%*\% theta) instead ;-)
}
## use constraints: x + y = 1
A <- matrix(c(1, 1), 1, 2)
B <- -1
res <- maxNR(f, start=c(0,0), constraints=list(eqA=A, eqB=B), print.level=1)
print(summary(res))

Run the code above in your browser using DataLab