Unconstrained and equality-constrained maximization based on the quadratic approximation (Newton) method. The Newton-Raphson, BFGS (Broyden 1970, Fletcher 1970, Goldfarb 1970, Shanno 1970), and BHHH (Berndt, Hall, Hall, Hausman 1974) methods are available.
maxNR(fn, grad = NULL, hess = NULL, start,
constraints = NULL, finalHessian = TRUE, bhhhHessian=FALSE,
fixed = NULL, activePar = NULL, control=NULL, ... )
maxBFGSR(fn, grad = NULL, hess = NULL, start,
constraints = NULL, finalHessian = TRUE,
fixed = NULL, activePar = NULL, control=NULL, ... )
maxBHHH(fn, grad = NULL, hess = NULL, start,
finalHessian = "BHHH", ... )
the function to be maximized.
It must have the parameter vector as the first argument and
it must return either a single number, or a numeric vector (this is
is summed internally).
If the BHHH method is used and argument gradient
is not given,
fn
must return a numeric vector of observation-specific
log-likelihood values.
If the parameters are out of range, fn
should
return NA
. See details for constant parameters.
fn
may also return attributes "gradient" and/or "hessian".
If these attributes are set, the algorithm uses the corresponding
values as
gradient and Hessian.
gradient of the objective function.
It must have the parameter vector as the first argument and
it must return either a gradient vector of the objective function,
or a matrix, where columns correspond to individual parameters.
The column sums are treated as gradient components.
If NULL
, finite-difference gradients are computed.
If BHHH method is used, grad
must return a matrix,
where rows corresponds to the gradient vectors for individual
observations and the columns to the individual parameters.
If fn
returns an object with attribute gradient
,
this argument is ignored.
Hessian matrix of the function.
It must have the parameter vector as the first argument and
it must return the Hessian matrix of the objective function.
If missing, finite-difference Hessian, based on gradient
,
is computed.
Hessian is used by the Newton-Raphson method only, and eventually by
the other methods if finalHessian
is requested.
initial parameter values. If start values are named, those names are also carried over to the results.
either NULL
for unconstrained optimization
or a list with two components. The components may be either
eqA
and eqB
for equality-constrained optimization
\(A \theta + B = 0\); or ineqA
and
ineqB
for inequality constraints \(A \theta + B > 0\). More
than one
row in ineqA
and ineqB
corresponds to more than
one linear constraint, in that case all these must be zero
(equality) or positive (inequality constraints).
The equality-constrained problem is forwarded
to sumt
, the inequality-constrained case to
constrOptim2
.
how (and if) to calculate the final Hessian. Either
FALSE
(do not calculate), TRUE
(use analytic/finite-difference
Hessian) or "bhhh"
/"BHHH"
for the information equality
approach. The latter approach is only suitable for maximizing
log-likelihood functions. It requires the gradient/log-likelihood to
be supplied by individual observations.
Note that computing the (actual, not BHHH) final Hessian
does not carry any extra penalty for the NR method,
but does for the other methods.
logical. Indicating whether to use the information
equality approximation (Bernd, Hall, Hall, and Hausman, 1974) for
the Hessian. This effectively transforms maxNR
into
maxBHHH
and is mainly designed for internal use.
parameters to be treated as constants at their
start
values. If present, it is treated as an index vector of
start
parameters.
this argument is retained for backward compatibility only;
please use argument fixed
instead.
list of control parameters. The control parameters used by these optimizers are
\(10^{-8}\),
stopping condition. Stop if the absolute difference
between successive iterations is less than tol
. Return
code=2
.
sqrt(.Machine$double.eps), stopping condition. Relative convergence tolerance: the algorithm stops if the relative improvement between iterations is less than ‘reltol’. Return code 2.
stopping condition. Stop if norm of the gradient is
less than gradtol
. Return code 1.
1e-10, stopping/error condition.
If qac == "stephalving"
and the quadratic
approximation leads to a worse, instead of a better value, or to
NA
, the step length
is halved and a new attempt is made. If necessary, this procedure is repeated
until step < steptol
, thereafter code 3 is returned.
\(10^{-6}\),
controls whether Hessian is treated as negative
definite. If the
largest of the eigenvalues of the Hessian is larger than
-lambdatol
(Hessian is not negative definite),
a suitable diagonal matrix is subtracted from the
Hessian (quadratic hill-climbing) in order to enforce negative
definiteness.
\(10^{-10}\), QR-decomposition tolerance for the Hessian inversion.
"stephalving", Quadratic Approximation Correction. When the new
guess is worse than the initial one, the algorithm attemts to correct it:
"stephalving" decreases the
step but keeps the direction,
"marquardt" uses
Marquardt (1963) method by decreasing the step length while also
moving closer to the pure gradient direction. It may be faster and
more robust choice in areas where quadratic approximation
behaves poorly. maxNR
and maxBHHH
only.
\(10^{-2}\), positive numeric, initial correction term for Marquardt (1963) correction.
2, how much the Marquardt
(1963)
correction term is
decreased/increased at each
successful/unsuccesful step.
maxNR
and maxBHHH
only.
\(10^{12}\),
maximum allowed Marquardt (1963) correction term. If exceeded, the
algorithm exits with return code 3.
maxNR
and maxBHHH
only.
stopping condition. Stop if more than iterlim
iterations, return code=4
.
this argument determines the level of printing which is done during the optimization process. The default value 0 means that no printing occurs, 1 prints the initial and final details, 2 prints all the main tracing information for every iteration. Higher values will result in even more output.
further arguments to fn
, grad
and
hess
.
Further arguments to maxBHHH
are also passed to
maxNR
.
To maintain compatibility with the earlier versions, … also passes a
number of control options (tol
, reltol
,
gradtol
, steptol
,
lambdatol
, qrtol
, iterlim
) to the optimizers.
object of class "maxim". Data can be extracted through the following functions:
maxValue
fn
value at maximum (the last calculated value
if not converged.)
estimated parameter value.
vector, last calculated gradient value. Should be close to 0 in case of normal convergence.
matrix of gradients at parameter value estimate
evaluated at each observation (only if grad
returns a matrix
or grad
is not specified and fn
returns a vector).
Hessian at the maximum (the last calculated value if not converged).
return code:
1 gradient close to zero (normal convergence).
2 successive function values within tolerance limit (normal convergence).
3 last step could not find higher value (probably not
converged). This is related to line search step getting too
small, usually because hitting the boundary of the parameter
space. It may also be related to attempts to move to a wrong
direction because of numerical errors. In some cases it can be
helped by changing steptol
.
4 iteration limit exceeded.
5 Infinite value.
6 Infinite gradient.
7 Infinite Hessian.
8 Successive function values withing relative tolerance limit (normal convergence).
9 (BFGS) Hessian approximation cannot be improved because of gradient did not change. May be related to numerical approximation problems or wrong analytic gradient.
100 Initial value out of range.
a short message, describing the return code.
logical vector, which parameters are optimized over.
Contains only TRUE
-s if no parameters are fixed.
number of iterations.
character string, type of maximization.
the optimization control parameters in the form of a
MaxControl
object.
The following components can only be extracted directly (with \$):
a list describing the last unsuccessful step if
code=3
with following components:
theta0 previous parameter value
f0 fn
value at theta0
climb the movement vector to the maximum of the quadratic approximation
A list, describing the constrained optimization
(NULL
if unconstrained). Includes the following components:
type type of constrained optimization
outer.iterations number of iterations in the constraints step
barrier.value value of the barrier function
No attempt is made to ensure that user-provided analytic
gradient/Hessian is correct. The users are
encouraged to use compareDerivatives
function,
designed for this purpose. If analytic gradient/Hessian are wrong,
the algorithm may not converge, or may converge to a wrong point.
As the BHHH method uses the likelihood-specific information equality, it is only suitable for maximizing log-likelihood functions!
Quasi-Newton methods, including those mentioned above, do not work
well in non-concave regions. This is especially the case with the
implementation in maxBFGSR
. The user is advised to
experiment with various tolerance options to achieve convergence.
The idea of the Newton method is to approximate the function at a given location by a multidimensional quadratic function, and use the estimated maximum as the start value for the next iteration. Such an approximation requires knowledge of both gradient and Hessian, the latter of which can be quite costly to compute. Several methods for approximating Hessian exist, including BFGS and BHHH.
The BHHH (information equality) approximation is only valid for
log-likelihood functions.
It requires the score (gradient) values by individual observations and hence
those must be returned
by individual observations by grad
or fn
.
The Hessian is approximated as the negative of the sum of the outer products
of the gradients of individual observations, or, in the matrix form,
$$
\mathsf{H}^{BHHH}
=
-\frac{1}{N} \sum_{i=1}^N
\left[
\frac{\partial \ell(\boldsymbol{\vartheta})}
{\boldsymbol{\vartheta}}
\frac{\partial \ell(\boldsymbol{\vartheta})}
{\boldsymbol{\vartheta}'}
\right]
$$
The functions maxNR
, maxBFGSR
, and maxBHHH
can work with constant parameters, useful if a parameter value
converges to the boundary of support, or for testing.
One way is to put
fixed
to non-NULL, specifying which parameters should be treated as
constants. The
parameters can also be fixed in runtime (only for maxNR
and maxBHHH
) by
signaling it with the
fn
return value. See Henningsen & Toomet (2011) for details.
Berndt, E., Hall, B., Hall, R. and Hausman, J. (1974): Estimation and Inference in Nonlinear Structural Models, Annals of Social Measurement 3, 653--665.
Broyden, C.G. (1970): The Convergence of a Class of Double-rank Minimization Algorithms, Journal of the Institute of Mathematics and Its Applications 6, 76--90.
Fletcher, R. (1970): A New Approach to Variable Metric Algorithms, Computer Journal 13, 317--322.
Goldfeld, S.M. and Quandt, R.E. (1972): Nonlinear Methods in Econometrics. Amsterdam: North-Holland.
Goldfarb, D. (1970): A Family of Variable Metric Updates Derived by Variational Means, Mathematics of Computation 24, 23--26.
Greene, W.H., (2008), Econometric Analysis, 6th edition, Prentice Hall.
Henningsen, A. and Toomet, O. (2011): maxLik: A package for maximum likelihood estimation in R Computational Statistics 26, 443--458
Marquardt, D.W., (1963) An Algorithm for Least-Squares Estimation of Nonlinear Parameters, Journal of the Society for Industrial & Applied Mathematics 11, 2, 431--441
Shanno, D.F. (1970): Conditioning of Quasi-Newton Methods for Function Minimization, Mathematics of Computation 24, 647--656.
maxLik
for a general framework for maximum likelihood
estimation (MLE);
maxBHHH
for maximizations using the Berndt, Hall, Hall,
Hausman (1974) algorithm (which is a wrapper function to maxNR
);
maxBFGS
for maximization using the BFGS, Nelder-Mead (NM),
and Simulated Annealing (SANN) method (based on optim
),
also supporting inequality constraints;
nlm
for Newton-Raphson optimization; and
optim
for different gradient-based optimization
methods.
# NOT RUN {
## estimate the exponential distribution parameter by ML
t <- rexp(100, 2)
loglik <- function(theta) sum(log(theta) - theta*t)
## Note the log-likelihood and gradient are summed over observations
gradlik <- function(theta) sum(1/theta - t)
hesslik <- function(theta) -100/theta^2
## Estimate with finite-difference gradient and Hessian
a <- maxNR(loglik, start=1, control=list(printLevel=2))
summary(a)
## You would probably prefer 1/mean(t) instead ;-)
## Estimate with analytic gradient and Hessian
a <- maxNR(loglik, gradlik, hesslik, start=1)
summary(a)
## BFGS estimation with finite-difference gradient
a <- maxBFGSR( loglik, start=1 )
summary(a)
## For the BHHH method we need likelihood values and gradients
## of individual observations
loglikInd <- function(theta) log(theta) - theta*t
gradlikInd <- function(theta) 1/theta - t
## Estimate with analytic gradient
a <- maxBHHH(loglikInd, gradlikInd, start=1)
summary(a)
##
## Example with a vector argument: Estimate the mean and
## variance of a random normal sample by maximum likelihood
## Note: you might want to use maxLik instead
##
loglik <- function(param) {
mu <- param[1]
sigma <- param[2]
ll <- -0.5*N*log(2*pi) - N*log(sigma) - sum(0.5*(x - mu)^2/sigma^2)
ll
}
x <- rnorm(100, 1, 2) # use mean=1, stdd=2
N <- length(x)
res <- maxNR(loglik, start=c(0,1)) # use 'wrong' start values
summary(res)
##
## The previous example with named parameters and fixed values
##
resFix <- maxNR(loglik, start=c(mu=0, sigma=1), fixed="sigma")
summary(resFix) # 'sigma' is exactly 1.000 now.
###
### Constrained optimization
###
## We maximize exp(-x^2 - y^2) where x+y = 1
hatf <- function(theta) {
x <- theta[1]
y <- theta[2]
exp(-(x^2 + y^2))
## Note: you may prefer exp(- theta %*% theta) instead
}
## use constraints: x + y = 1
A <- matrix(c(1, 1), 1, 2)
B <- -1
res <- maxNR(hatf, start=c(0,0), constraints=list(eqA=A, eqB=B),
control=list(printLevel=1))
print(summary(res))
# }
Run the code above in your browser using DataCamp Workspace