Last chance! 50% off unlimited learning
Sale ends in
Implement Bundle Methods for Regularized Risk Minimization as described in Teo et. al 2007. Find w that minimize: LAMBDA*regularization_norm(w) + lossfun(w) where regularization_norm is either L1 or L2.
bmrm(riskFun, LAMBDA = 1, MAX_ITER = 100, EPSILON_TOL = 0.01,
regfun = c("l1", "l2"), w0 = 0, verbose = TRUE)
the loss function to use in the optimization (e.g.: hingeLoss, softMarginVectorLoss). The function must evaluate the loss value and its gradient for a given point vector (w).
control the regularization strength in the optimization process. This is the value used as coefficient of the regularization term.
the maximum number of iteration to perform. The function stop with a warning message if the number of iteration exceed this value
control optimization stoping criteria: the optimization end when the optimization gap is below this threshold
type of regularization to consider in the optimization. It can either be the character string "l1" for L1-norm regularization, or "l2" (default) for L2-norm regularization.
initial weight vector where optimization start
a length one logical. Show progression of the convergence on stdout
the optimized weight vector, with attribute "log" beging a data.frame storing a trace of important values of the optimization process.
Teo et al. A Scalable Modular Convex Solver for Regularized Risk Minimization. KDD 2007
# NOT RUN {
# -- Create a 2D dataset with the first 2 features of iris, with binary labels
x <- data.matrix(iris[1:2])
y <- c(-1,1,1)[iris$Species]
# -- Add a constant dimension to the dataset to learn the intercept
x <- cbind(x,1)
# -- train scalar prediction models with maxMarginLoss and fbetaLoss
models <- list(
svm_L1 = bmrm(hingeLoss(x,y),LAMBDA=0.1,regfun='l1',verbose=TRUE),
svm_L2 = bmrm(hingeLoss(x,y),LAMBDA=0.1,regfun='l2',verbose=TRUE),
f1_L1 = bmrm(fbetaLoss(x,y),LAMBDA=0.01,regfun='l1',verbose=TRUE)
)
# -- Plot the dataset and the predictions
layout(matrix(1:2,1,2))
plot(x,pch=20+y,main="dataset & hyperplanes")
legend('bottomright',legend=names(models),col=seq_along(models),lty=1,cex=0.75,lwd=3)
for(i in seq_along(models)) {
w <- models[[i]]
if (w[2]!=0) abline(-w[3]/w[2],-w[1]/w[2],col=i,lwd=3)
}
rx <- range(na.rm=TRUE,1,unlist(lapply(models,function(e) nrow(attr(e,"log")))))
ry <- range(na.rm=TRUE,0,unlist(lapply(models,function(e) attr(e,"log")$epsilon)))
plot(rx,ry,type="n",ylab="epsilon gap",xlab="iteration",main="evolution of the epsilon gap")
for(i in seq_along(models)) {
log <- attr(models[[i]],"log")
lines(log$epsilon,type="o",col=i,lwd=3)
}
# -- fit a least absolute deviation linear model on a synthetic dataset
# -- containing 196 meaningful features and 4 noisy features. Then
# -- check if the model has detected the noise
set.seed(123)
X <- matrix(rnorm(4000*200), 4000, 200)
beta <- c(rep(1,ncol(X)-4),0,0,0,0)
Y <- X%*%beta + rnorm(nrow(X))
w <- bmrm(ladRegressionLoss(X,Y),regfun="l2",LAMBDA=100,MAX_ITER=150)
layout(1)
barplot(as.vector(w))
# }
Run the code above in your browser using DataLab