bmrm: Bundle Methods for Regularized Risk Minimization

Description

Implement Bundle Methods for Regularized Risk Minimization as described in Teo et. al 2007. Find w that minimize: LAMBDA*regularization_norm(w) + lossfun(w) where regularization_norm is either L1 or L2.

Usage

bmrm(riskFun, LAMBDA = 1, MAX_ITER = 100, EPSILON_TOL = 0.01,
  regfun = c("l1", "l2"), w0 = 0, verbose = TRUE)

Arguments

riskFun

the loss function to use in the optimization (e.g.: hingeLoss, softMarginVectorLoss). The function must evaluate the loss value and its gradient for a given point vector (w).

LAMBDA

control the regularization strength in the optimization process. This is the value used as coefficient of the regularization term.

MAX_ITER

the maximum number of iteration to perform. The function stop with a warning message if the number of iteration exceed this value

EPSILON_TOL

control optimization stoping criteria: the optimization end when the optimization gap is below this threshold

regfun

type of regularization to consider in the optimization. It can either be the character string "l1" for L1-norm regularization, or "l2" (default) for L2-norm regularization.

initial weight vector where optimization start

verbose

a length one logical. Show progression of the convergence on stdout

Value

the optimized weight vector, with attribute "log" beging a data.frame storing a trace of important values of the optimization process.

References

Teo et al. A Scalable Modular Convex Solver for Regularized Risk Minimization. KDD 2007

Examples

Run this code

# NOT RUN {
  # -- Create a 2D dataset with the first 2 features of iris, with binary labels
  x <- data.matrix(iris[1:2])
  y <- c(-1,1,1)[iris$Species]
  
  # -- Add a constant dimension to the dataset to learn the intercept
  x <- cbind(x,1)
  
  # -- train scalar prediction models with maxMarginLoss and fbetaLoss 
  models <- list(
    svm_L1 = bmrm(hingeLoss(x,y),LAMBDA=0.1,regfun='l1',verbose=TRUE),
    svm_L2 = bmrm(hingeLoss(x,y),LAMBDA=0.1,regfun='l2',verbose=TRUE),
    f1_L1 = bmrm(fbetaLoss(x,y),LAMBDA=0.01,regfun='l1',verbose=TRUE)
  )
  
  # -- Plot the dataset and the predictions
  layout(matrix(1:2,1,2))
  plot(x,pch=20+y,main="dataset & hyperplanes")
  legend('bottomright',legend=names(models),col=seq_along(models),lty=1,cex=0.75,lwd=3)
  for(i in seq_along(models)) {
    w <- models[[i]]
    if (w[2]!=0) abline(-w[3]/w[2],-w[1]/w[2],col=i,lwd=3)
  }
  
  rx <- range(na.rm=TRUE,1,unlist(lapply(models,function(e) nrow(attr(e,"log")))))
  ry <- range(na.rm=TRUE,0,unlist(lapply(models,function(e) attr(e,"log")$epsilon)))
  plot(rx,ry,type="n",ylab="epsilon gap",xlab="iteration",main="evolution of the epsilon gap")
  for(i in seq_along(models)) {
    log <- attr(models[[i]],"log")
    lines(log$epsilon,type="o",col=i,lwd=3)
  }
  
  
  # -- fit a least absolute deviation linear model on a synthetic dataset
  # -- containing 196 meaningful features and 4 noisy features. Then
  # -- check if the model has detected the noise
  set.seed(123)
  X <- matrix(rnorm(4000*200), 4000, 200)
  beta <- c(rep(1,ncol(X)-4),0,0,0,0)
  Y <- X%*%beta + rnorm(nrow(X))
  w <- bmrm(ladRegressionLoss(X,Y),regfun="l2",LAMBDA=100,MAX_ITER=150)
  layout(1)
  barplot(as.vector(w))
  
  
# }