liquidSVM-package: liquidSVM for R

Description

Support vector machines (SVMs) and related kernel-based learning algorithms are a well-known class of machine learning algorithms, for non-parametric classification and regression. liquidSVM is an implementation of SVMs whose key features are:

fully integrated hyper-parameter selection,
extreme speed on both small and large data sets,
full flexibility for experts, and
inclusion of a variety of different learning scenarios:
- multi-class classification, ROC, and Neyman-Pearson learning, and
- least-squares, quantile, and expectile regression

Further information is available in the following vignettes:

`demo`	liquidSVM Demo (source, pdf)
`documentation`	liquidSVM Documentation (source, pdf)

Arguments

Known issues

Interruption (Ctrl-C) of running train/select/test phases is honored, but can leave the C++ library in an inconsistent state, so that it is better to save your work and restart your R session.

liquidSVM is multi-threaded and is difficult to be multi-threaded externally, see documentation

Details

In liquidSVM an application cycle is divided into a training phase, in which various SVM models are created and validated, a selection phase, in which the SVM models that best satisfy a certain criterion are selected, and a test phase, in which the selected models are applied to test data. These three phases are based upon several components, which can be freely combined using different components: solvers, hyper-parameter selection, working sets. All of these can be configured (see Configuration) a

For instance multi-class classification with \(k\) labels has to be delegated to several binary classifications called tasks either using all-vs-all (\(k(k-1)/2\) tasks on the corresponding subsets) or one-vs-all (\(k\) tasks on the full data set). Every task can be split into cells in order to handle larger data sets (for example \(>10000\) samples). Now for every task and every cell, several folds are created to enable cross-validated hyper-parameter selection.

The following learning scenarios can be used out of the box:

mcSVM: binary and multi-class classification
lsSVM: least squares regression
nplSVM: Neyman-Pearson learning to classify with a specified rate on one type of error
rocSVM: Receivert Operating Characteristic (ROC) curve to solve multiple weighted binary classification problems.
qtSVM: quantile regression
exSVM: expectile regression
bsSVM: bootstrapping

To calculate kernel matrices as used by the SVM we also provide for convenience the function kern.

liquidSVM can benefit heavily from native compilation, hence we recommend to (re-)install it using the information provided in the installation section of the documentation vignette.

References

http://www.isa.uni-stuttgart.de

Examples

Run this code

# NOT RUN {
set.seed(123)
## Multiclass classification
modelIris <- svm(Species ~ ., iris)
y <- predict(modelIris, iris)

## Least Squares
modelTrees <- svm(Height ~ Girth + Volume, trees)
y <- predict(modelTrees, trees)
plot(trees$Height, y)
test(modelTrees, trees)

## Quantile regression
modelTrees <- qtSVM(Height ~ Girth + Volume, trees, scale=TRUE)
y <- predict(modelTrees, trees)

## ROC curve
modelWarpbreaks <- rocSVM(wool ~ ., warpbreaks, scale=TRUE)
y <- test(modelWarpbreaks, warpbreaks)
plotROC(y,warpbreaks$wool)
# }

Run the code above in your browser using DataLab