turboem: A suite of acceleration schemes for fixed-point iterations

Description

Globally-convergent, partially monotone, acceleration schemes for accelerating the convergence of any smooth, monotone, slowly-converging contraction mapping. It can be used to accelerate the convergence of a wide variety of iterations including the expectation-maximization (EM) algorithms and its variants, majorization-minimization (MM) algorithm, power method for dominant eigenvalue-eigenvector, Google's page-rank algorithm, and multi-dimensional scaling.

Usage

turboem(par, fixptfn, objfn, method = c("em", "squarem", "pem", "decme", "qn"), 
	boundary, pconstr = NULL, project = NULL, parallel = FALSE, ..., control.method = replicate(length(method),list()), 
	control.run = list())

Arguments

par

A vector of parameters denoting the initial guess for the fixed point.

fixptfn

A vector function, $F$ that denotes the fixed-point mapping. This function is the most essential input in the package. It should accept a parameter vector as input and should return a parameter vector of same length. This function defines the fixed-point

objfn

This is a scalar function, $L$, that denotes a ``merit'' function which attains its local minimum at the fixed-point of $F$. This function should accept a parameter vector as input and should return a scalar value. In the EM algorithm, the merit functio

method

Specifies which algorithm(s) will be applied. Must be a vector containing one or more of c("em", "squarem", "pem", "decme", "qn").

boundary

Argument required for Dynamic ECME (decme) only. Function to define the subspaces over which the line search is conducted.

pconstr

Optional function for defining boundary constraints on parameter values. Function maps a vector of parameter values to TRUE if constraints are satisfied. Note that this argument is only used for the Squarem (squarem), Parabolic EM (pem<

project

Optional function for defining a projection that maps an out-of-bound parameter value into the constrained parameter space. Requires the pconstr argument to be specified in order for the project to be applied.

parallel

Logical indicating whether the acceleration schemes will be run in parallel. Note that the parallel implementation is based on the foreach package, which depends on a parallel backend being registered prior to running turboem()

control.method

If method = c(method1, method2, ...), then control.method = list(list1, list2, ...) where list1 is the list of control parameters for method1, list2 is the list of control parameters for

control.run

List of control parameters for convergence and stopping the algorithms. See *Details*.

...

Arguments passed to fixptfn and objfn.

Value

turboem returns an object of class turbo. An object of class turbo is a list containing at least the following components:
failVector of logical values whose $j$th element indicates whether algorithm $j$ failed (produced an error)
value.objfnVector of the value of the objective function $L$ at termination for each algorithm.
itrVector of the number of iterations completed for each algorithm.
fpevalVector of the number of fixed-point evaluations completed for each algorithm.
objfevalVector of the number of objective function evaluations completed for each algorithm.
convergenceVector of logical values whose $j$th element indicates whether algorithm $j$ satisfied the convergence criterion before termination
runtimeMatrix whose $j$th row contains the ``user'', ``system'', and ``elapsed'' time for running the $j$th algorithm.
errorsVector whose $j$th element is either NA or contains the error message from running the $j$th algorithm
parsMatrix whose $j$th row contains the fixed-point parameter values at termination for the $j$th algorithm.
trace.objfvalIf control.run[["keep.objfval"]]=TRUE, contains a list whose $j$th component is a vector of objective function values across iterations for the $j$th algorithm

Details

The function turboem is a general-purpose algorithm for accelerating the convergence of any slowly-convergent (smooth) fixed-point iteration.

The component lists of the control.method are used to specify any changes to default values of algorithm control parameters. Full names of control list elements must be specified, otherwise, user specifications are ignored. Default control parameters for method="squarem" are K=1, square=TRUE, version=3, step.min0=1, step.max0=1, mstep=4, kr=1, objfn.inc=1. Default control parameters for method="pem" are l=10, h=0.1, a=1.5, and version="geometric". Default control parameters for method="decme" are version="v2" and tol_op=0.01. Default control parameters for method="qn" are qn=5.

Default values of control.run are: convtype = "parameter", tol = 1.0e-07, stoptype = "maxiter", maxiter = 1500, maxtime = 60, convfn.user = NULL, stopfn.user = NULL, trace = FALSE, keep.objfval = FALSE. There are two ways the algorithm will terminate. Either the algorithm will terminate if convergence has been achieved, or the algorithm will terminate if convergence has not been achieved within a pre-specified maximum number of iterations or maximum running time. The arguments convtype, tol, and convfn.user control the convergence criterion. The arguments stoptype, maxiter, maxtime, and stopfn.user control the alternative stopping criterion.

Two types of convergence criteria have been implemented, with an option for the user to define his/her own convergence criterion. If convtype = "parameter", then the default convergence criterion is to terminate if sqrt(crossprod(new - old)) < tol, where new denotes the current value of the fixed point and old denotes the previous fixed-point value. If convtype = "objfn", then the default convergence criterion is to terminate if abs(new - old) < tol, where new denotes the current value of the objective function and old denotes the previous value of the objective function. If the user desires alternate convergence criteria, convfn.user may be specified as a function with inputs new and old that maps to a logical taking the value TRUE if convergence is achieved and the value FALSE if convergence is not achieved.

Two types of alternative stopping criteria have been implemented, with the option for the user to define his/her own stopping criterion. If stoptype = "maxiter", then the algorithm will terminate if convergence has not been achieved within maxiter iterations of the acceleration scheme. If stoptype = "maxtime", then the algorithm will terminate if convergence has not been achieved within maxtime seconds of running the acceleration scheme. Note: the running time of the acceleration scheme is calculated once every iteration. If the user desires different alternate stopping criteria than those implemented, stopfn.user may be specified as a function with no inputs that maps to a logical taking the value TRUE which leads to the algorithm being terminated or the value FALSE which leads to the algorithm proceeding as usual.

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

References

R Varadhan and C Roland (2008). Simple and globally convergent numerical schemes for accelerating the convergence of any EM algorithm. Scandinavian Journal of Statistics, 35:335-353.

A Berlinet and C Roland (2009). Parabolic acceleration of the EM algorithm. Stat Comput. 19 (1) 35-47.

Y He and C Liu (2010) The Dynamic ECME Algorithm. Technical Report. arXiv:1004.0524v1.

H Zhou, DH Alexander, and KL Lange (2011). A quasi-Newton acceleration for high-dimensional optimization algorithms. Stat Comput. 21 (2) 261-273.

Examples

Run this code

###########################################################################
# Also see the vignette by typing:
#  vignette("turboEM")
#
# EM algorithm for Poisson mixture estimation 

fixptfn <- function(p,y) {
# The fixed point mapping giving a single E and M step of the EM algorithm
# 
pnew <- rep(NA,3)
i <- 0:(length(y)-1)
zi <- p[1]*exp(-p[2])*p[2]^i / (p[1]*exp(-p[2])*p[2]^i + (1 - p[1])*exp(-p[3])*p[3]^i)
pnew[1] <- sum(y*zi)/sum(y)
pnew[2] <- sum(y*i*zi)/sum(y*zi)
pnew[3] <- sum(y*i*(1-zi))/sum(y*(1-zi))
p <- pnew
return(pnew)
}

objfn <- function(p,y) {
# Objective function whose local minimum is a fixed point 
# negative log-likelihood of binary poisson mixture
i <- 0:(length(y)-1)
loglik <- y*log(p[1]*exp(-p[2])*p[2]^i/exp(lgamma(i+1)) + 
		(1 - p[1])*exp(-p[3])*p[3]^i/exp(lgamma(i+1)))
return ( -sum(loglik) )
}

# Real data from Hasselblad (JASA 1969)
poissmix.dat <- data.frame(death = 0:9, 
	freq = c(162,267,271,185,111,61,27,8,3,1))
y <- poissmix.dat$freq

# Use a preset seed so the example is reproducable. 
require("setRNG")
old.seed <- setRNG(list(kind = "Mersenne-Twister", normal.kind = "Inversion",
    seed = 54321))

p0 <- c(runif(1),runif(2,0,4))  # random starting value

# Basic EM algorithm, SQUAREM, and parabolic EM, with default settings
res1 <- turboem(par = p0, y = y, fixptfn = fixptfn, objfn = objfn, 
	method = c("EM", "squarem", "pem"))

# To apply the dynamic ECME (decme) acceleration scheme, 
# we need to include a boundary function
boundary <- function(par, dr) {
	lower <- c(0, 0, 0)
	upper <- c(1, 10000, 10000)
	low1 <- max(pmin((lower-par)/dr, (upper-par)/dr))
	upp1 <- min(pmax((lower-par)/dr, (upper-par)/dr))
	return(c(low1, upp1))
}
res2 <- turboem(par = p0, y = y, fixptfn = fixptfn, objfn = objfn, 
	boundary = boundary, method = c("EM", "squarem", "pem", "decme"))

# change some of the algorithm-specific default specifications (control.method), 
# as well as the global control parameters (control.run)
res3 <- turboem(par = p0, y = y, fixptfn = fixptfn, objfn = objfn, 
	boundary = boundary, method = c("em", "squarem", "squarem", "decme", "qn", "qn"), 
	control.method = list(list(), list(K = 2), list(K = 3), 
		list(version = "v3"), list(qn = 1), list(qn = 2)),
	control.run = list(tol = 1e-12, stoptype = "maxtime", maxtime = 1))

# Only the standard EM algorithm and SQUAREM *do not* require 
# providing the objective function. 
res4 <- turboem(par = p0, y = y, fixptfn = fixptfn, 
	method = c("em", "squarem", "squarem"), 
	control.method = list(list(), list(K = 1), list(K = 2)))
# If no objective function is provided, the "squarem" method defaults to Squarem-2 (or if control parameter K > 1, to Cyclem-2). Compare Squarem with and without objective function provided:
res5 <- turboem(par = p0, y = y, fixptfn = fixptfn, method = "squarem")
res5
res6 <- turboem(par = p0, y = y, fixptfn = fixptfn, objfn = objfn, method = "squarem")
res6

Run the code above in your browser using DataLab