turboem: A suite of acceleration schemes for fixed-point iterations

Description

Globally-convergent, partially monotone, acceleration schemes for accelerating the convergence of any smooth, monotone, slowly-converging contraction mapping. It can be used to accelerate the convergence of a wide variety of iterations including the expectation-maximization (EM) algorithms and its variants, majorization-minimization (MM) algorithm, power method for dominant eigenvalue-eigenvector, Google's page-rank algorithm, and multi-dimensional scaling.

Usage

turboem(par, fixptfn, objfn, method = c("em","squarem","pem","decme","qn"), 
        boundary, pconstr = NULL, project = NULL, parallel = FALSE, ..., 
        control.method = replicate(length(method),list()), control.run = list())

Arguments

par

A vector of parameters denoting the initial guess for the fixed point.

fixptfn

A vector function, \(F\) that denotes the fixed-point mapping. This function is the most essential input in the package. It should accept a parameter vector as input and should return a parameter vector of same length. This function defines the fixed-point iteration: \(x_{k+1} = F(x_k)\). In the case of EM algorithm, \(F\) defines a single E and M step.

objfn

This is a scalar function, \(L\), that denotes a ``merit'' function which attains its local minimum at the fixed-point of \(F\). This function should accept a parameter vector as input and should return a scalar value. In the EM algorithm, the merit function \(L\) is the negative log-likelihood. In some problems, a natural merit function may not exist. However, this argument is required for all of the algorithms *except* Squarem (which defaults to Squarem-2 if objfn not provided) and EM.

method

Specifies which algorithm(s) will be applied. Must be a vector containing one or more of c("em", "squarem", "pem", "decme", "qn").

boundary

Argument required for Dynamic ECME (decme) only. Function to define the subspaces over which the line search is conducted.

pconstr

Optional function for defining boundary constraints on parameter values. Function maps a vector of parameter values to TRUE if constraints are satisfied. Note that this argument is only used for the Squarem (squarem), Parabolic EM (pem), and quasi-Newton (qn) algorithms, and it has no effect on the other algorithms.

project

Optional function for defining a projection that maps an out-of-bound parameter value into the constrained parameter space. Requires the pconstr argument to be specified in order for the project to be applied.

parallel

Logical indicating whether the acceleration schemes will be run in parallel. Note that the parallel implementation is based on the foreach package, which depends on a parallel backend being registered prior to running turboem(). See *Details* of foreach.

control.method

If method = c(method1, method2, ...), then control.method = list(list1, list2, ...) where list1 is the list of control parameters for method1, list2 is the list of control parameters for method2, and so on. If length(method) == 1, then control.method is the list of control parameters for the acceleration scheme. See *Details*.

control.run

List of control parameters for convergence and stopping the algorithms. See *Details*.

...

Arguments passed to fixptfn and objfn.

Value

turboem returns an object of class turbo. An object of class turbo is a list containing at least the following components:

fail

Vector of logical values whose \(j\)th element indicates whether algorithm \(j\) failed (produced an error)

value.objfn

Vector of the value of the objective function \(L\) at termination for each algorithm.

itr

Vector of the number of iterations completed for each algorithm.

fpeval

Vector of the number of fixed-point evaluations completed for each algorithm.

objfeval

Vector of the number of objective function evaluations completed for each algorithm.

convergence

Vector of logical values whose \(j\)th element indicates whether algorithm \(j\) satisfied the convergence criterion before termination

runtime

Matrix whose \(j\)th row contains the ``user'', ``system'', and ``elapsed'' time for running the \(j\)th algorithm.

errors

Vector whose \(j\)th element is either NA or contains the error message from running the \(j\)th algorithm

pars

Matrix whose \(j\)th row contains the fixed-point parameter values at termination for the \(j\)th algorithm.

trace.objfval

If control.run[["keep.objfval"]]=TRUE, contains a list whose \(j\)th component is a vector of objective function values across iterations for the \(j\)th algorithm.

trace.paramval

If control.run[["keep.paramval"]]=TRUE, contains a list whose \(j\)th component is a matrix of parameter estimates across iterations for the \(j\)th algorithm.

Details

The function turboem is a general-purpose algorithm for accelerating the convergence of any slowly-convergent (smooth) fixed-point iteration.

The component lists of the control.method are used to specify any changes to default values of algorithm control parameters. Full names of control list elements must be specified, otherwise, user specifications are ignored. Default control parameters for method="squarem" are K=1, square=TRUE, version=3, step.min0=1, step.max0=1, mstep=4, kr=1, objfn.inc=1. Default control parameters for method="pem" are l=10, h=0.1, a=1.5, and version="geometric". Default control parameters for method="decme" are version="v2" and tol_op=0.01. Default control parameters for method="qn" are qn=5.

Default values of control.run are: convtype = "parameter", tol = 1.0e-07, stoptype = "maxiter", maxiter = 1500, maxtime = 60, convfn.user = NULL, stopfn.user = NULL, trace = FALSE, keep.objfval = FALSE, keep.paramval = FALSE.

There are two ways the algorithm will terminate. Either the algorithm will terminate if convergence has been achieved, or the algorithm will terminate if convergence has not been achieved within a pre-specified maximum number of iterations or maximum running time. The arguments convtype, tol, and convfn.user control the convergence criterion. The arguments stoptype, maxiter, maxtime, and stopfn.user control the alternative stopping criterion.

Two types of convergence criteria have been implemented, with an option for the user to define his/her own convergence criterion. If convtype = "parameter", then the default convergence criterion is to terminate if sqrt(crossprod(new - old)) < tol, where new denotes the current value of the fixed point and old denotes the previous fixed-point value. If convtype = "objfn", then the default convergence criterion is to terminate if abs(new - old) < tol, where new denotes the current value of the objective function and old denotes the previous value of the objective function. If the user desires alternate convergence criteria, convfn.user may be specified as a function with inputs new and old that maps to a logical taking the value TRUE if convergence is achieved and the value FALSE if convergence is not achieved.

Two types of alternative stopping criteria have been implemented, with the option for the user to define his/her own stopping criterion. If stoptype = "maxiter", then the algorithm will terminate if convergence has not been achieved within maxiter iterations of the acceleration scheme. If stoptype = "maxtime", then the algorithm will terminate if convergence has not been achieved within maxtime seconds of running the acceleration scheme. Note: the running time of the acceleration scheme is calculated once every iteration. If the user desires different alternate stopping criteria than those implemented, stopfn.user may be specified as a function with no inputs that maps to a logical taking the value TRUE which leads to the algorithm being terminated or the value FALSE which leads to the algorithm proceeding as usual.

convtype: A character string equal to "parameter" or "objfn". "parameter" indicates that the convergence criterion is a function of the current and previous value of the fixed point. objfn indicates that the convergence criterion is a function of the current and previous value of the objective function.
tol: A small, positive scalar that determines when convergence is achieved. See details above for convergence criteria currently implemented. Default is 1.e-07.
stoptype: A character string equal to "maxiter" or "maxtime" that determines an alternative stopping rule for the algorithm. See details above for stopping rules currently implemented. Default is "maxiter".
maxiter: If stoptype = "maxiter", specifies the number of iterations after which the algorithm will be terminated if convergence has not been achieved. Default is 1500.
maxtime: If stoptype = "maxtime", specifies the running time (in seconds) after which the algorithm will be terminated if convergence has not been achieved. Default is 60.
convfn.user: Optional, user-specified function for determining whether convergence has been achieved. Function should take as inputs new and old, where new is the current value (of the fixed point if convtype = "parameter" and of the objective function value if convtype = "objfn") and old is the previous value. Function should map to a logical taking the value TRUE if convergence is achieved (and hence the algorithm is terminated) and the value FALSE if convergence is not achieved. Default is NULL.
stopfn.user: Optional, user-specified function for determining whether to terminate the algorithm if convergence has not been achieved. See details above for how to specify. Default is NULL.
trace: A logical variable denoting whether some of the intermediate results of iterations should be displayed to the user. Default is FALSE.
keep.objfval: A logical variable denoting whether the objective function value at each iteration should be stored. Default is FALSE.
keep.paramval: A logical variable denoting whether the parameter estimates at each iteration should be stored. Default is FALSE.

References

R Varadhan and C Roland (2008). Simple and globally convergent numerical schemes for accelerating the convergence of any EM algorithm. Scandinavian Journal of Statistics, 35:335-353.

A Berlinet and C Roland (2009). Parabolic acceleration of the EM algorithm. Stat Comput. 19 (1) 35-47.

Y He and C Liu (2010) The Dynamic ECME Algorithm. Technical Report. arXiv:1004.0524v1.

H Zhou, DH Alexander, and KL Lange (2011). A quasi-Newton acceleration for high-dimensional optimization algorithms. Stat Comput. 21 (2) 261-273.

Examples

Run this code

# NOT RUN {
###########################################################################
# Also see the vignette by typing:
#  vignette("turboEM")
#
# EM algorithm for Poisson mixture estimation 

fixptfn <- function(p,y) {
# The fixed point mapping giving a single E and M step of the EM algorithm
# 
pnew <- rep(NA,3)
i <- 0:(length(y)-1)
zi <- p[1]*exp(-p[2])*p[2]^i / (p[1]*exp(-p[2])*p[2]^i + (1 - p[1])*exp(-p[3])*p[3]^i)
pnew[1] <- sum(y*zi)/sum(y)
pnew[2] <- sum(y*i*zi)/sum(y*zi)
pnew[3] <- sum(y*i*(1-zi))/sum(y*(1-zi))
p <- pnew
return(pnew)
}

objfn <- function(p,y) {
# Objective function whose local minimum is a fixed point 
# negative log-likelihood of binary poisson mixture
i <- 0:(length(y)-1)
loglik <- y*log(p[1]*exp(-p[2])*p[2]^i/exp(lgamma(i+1)) + 
		(1 - p[1])*exp(-p[3])*p[3]^i/exp(lgamma(i+1)))
return ( -sum(loglik) )
}

# Real data from Hasselblad (JASA 1969)
poissmix.dat <- data.frame(death = 0:9, 
	freq = c(162,267,271,185,111,61,27,8,3,1))
y <- poissmix.dat$freq

# Use a preset seed so the example is reproducable. 
require("setRNG")
old.seed <- setRNG(list(kind = "Mersenne-Twister", normal.kind = "Inversion",
    seed = 54321))

p0 <- c(runif(1),runif(2,0,4))  # random starting value

# Basic EM algorithm, SQUAREM, and parabolic EM, with default settings
res1 <- turboem(par = p0, y = y, fixptfn = fixptfn, objfn = objfn, 
	method = c("EM", "squarem", "pem"))

# To apply the dynamic ECME (decme) acceleration scheme, 
# we need to include a boundary function
boundary <- function(par, dr) {
	lower <- c(0, 0, 0)
	upper <- c(1, 10000, 10000)
	low1 <- max(pmin((lower-par)/dr, (upper-par)/dr))
	upp1 <- min(pmax((lower-par)/dr, (upper-par)/dr))
	return(c(low1, upp1))
}
res2 <- turboem(par = p0, y = y, fixptfn = fixptfn, objfn = objfn, 
	boundary = boundary, method = c("EM", "squarem", "pem", "decme"))

# change some of the algorithm-specific default specifications (control.method), 
# as well as the global control parameters (control.run)
res3 <- turboem(par = p0, y = y, fixptfn = fixptfn, objfn = objfn, 
  boundary = boundary, method = c("em", "squarem", "squarem", "decme", "qn", "qn"), 
	control.method = list(list(), list(K = 2), list(K = 3), 
		list(version = "v3"), list(qn = 1), list(qn = 2)),
	control.run = list(tol = 1e-12, stoptype = "maxtime", maxtime = 1))

# Only the standard EM algorithm and SQUAREM *do not* require 
# providing the objective function. 
res4 <- turboem(par = p0, y = y, fixptfn = fixptfn, 
	method = c("em", "squarem", "squarem"), 
	control.method = list(list(), list(K = 1), list(K = 2)))
# If no objective function is provided, the "squarem" method defaults to Squarem-2 
# Or, if control parameter K > 1, it defaults to Cyclem-2. 
# Compare Squarem with and without objective function provided:
res5 <- turboem(par = p0, y = y, fixptfn = fixptfn, method = "squarem")
res5
res6 <- turboem(par = p0, y = y, fixptfn = fixptfn, objfn = objfn, method = "squarem")
res6

# }

Run the code above in your browser using DataLab