bmrm: Bundle Methods for Regularized Risk Minimization

Description

Implement Bundle Methods for Regularized Risk Minimization as described in Teo et. al 2007.

Usage

bmrm(..., LAMBDA = 1, MAX_ITER = 100, EPSILON_TOL = 0.01,
  lossfun = hingeLoss, regfun = c("l2", "l1"), w0 = 0, verbose = FALSE)

Arguments

lossfun

the loss function to use in the optimization (e.g.: hingeLoss, softMarginVectorLoss). The function must evaluate the loss value and its gradient for a given point vector (w). The function must be of the form lossfun(w,...,cache=NULL), i.e. accept

LAMBDA

control the regularization strength in the optimization process. This is the value used as coefficient of the regularization term.

MAX_ITER

the maximum number of iteration to perform. The function stop with a warning message if the number of iteration exceed this value

EPSILON_TOL

control optimization stoping criteria: the optimization end when the optimization gap is below this threshold

regfun

type of regularization to consider in the optimization. It can either be the character string "l1" for L1-norm regularization, or "l2" (default) for L2-norm regularization.

a numeric vector used to initialize the minimization process

verbose

a length one logical. Show progression of the convergence on stdout

...

additional argument passed to the loss function

Value

a list of 2 fileds: "w" the optimized weight vector; "log" a data.frame showing the trace of important values in the optimization process.

References

Teo et al. A Scalable Modular Convex Solver for Regularized Risk Minimization. KDD 2007

Examples

Run this code

# -- Create a 2D dataset with the first 2 features of iris, with binary labels
  x <- data.matrix(iris[1:2])
  y <- c(-1,1,1)[iris$Species]

  # -- Add a constant dimension to the dataset to learn the intercept
  x <- cbind(x,1)

  train.prediction.model <- function(x,y,lossfun=hingeLoss,...) {
    m <- bmrm(x,y,lossfun=lossfun,...)
    m$f <- x %*% m$w
    m$y <- sign(m$f)
    m$contingencyTable <- table(y,m$y)
    print(m$contingencyTable)
    return(m)
  }

  # -- train scalar prediction models with maxMarginLoss and fbetaLoss
  models <- list(
    svm_L1 = train.prediction.model(x,y,lossfun=hingeLoss,LAMBDA=0.01,regfun='l1'),
    svm_L2 = train.prediction.model(x,y,lossfun=hingeLoss,LAMBDA=0.1,regfun='l2'),
    f1_L1 = train.prediction.model(x,y,lossfun=fbetaLoss,LAMBDA=0.01,regfun='l1')
  )

  # -- Plot the dataset and the predictions
  layout(matrix(1:2,1,2))
  plot(x,pch=20+y,main="dataset & hyperplanes")
  legend('bottomright',legend=names(models),col=seq_along(models),lty=1,cex=0.75,lwd=3)
  for(i in seq_along(models)) {
    m <- models[[i]]
    if (m$w[2]!=0) abline(-m$w[3]/m$w[2],-m$w[1]/m$w[2],col=i,lwd=3)
  }

  rx <- range(na.rm=TRUE,1,unlist(lapply(models,function(e) nrow(e$log))))
  ry <- range(na.rm=TRUE,0,unlist(lapply(models,function(e) e$log$epsilon)))
  plot(rx,ry,type="n",ylab="epsilon gap",xlab="iteration",main="evolution of the epsilon gap")
  for(i in seq_along(models)) {
    m <- models[[i]]
    lines(m$log$epsilon,type="o",col=i,lwd=3)
  }