trainSVMs: Trains an SVM object.

Description

Should only be used by experts! This uses the liquidSVM C++ implementation to solve all SVMs on the hyper-parameter grid.

Usage

trainSVMs(model, ..., solver = c("kernel.rule", "ls", "hinge", "quantile"),
  command.args = NULL, do.select = FALSE, useCells = FALSE, d = NULL)

Arguments

model

the svm-model

...

configuration parameters set before training

solver

solver to use: one of "kernel.rule","ls","hinge","quantile","expectile"

command.args

further arguments aranged in a list, corresponding to the arguments of the command line interface to svm-train, e.g. list(d=2,W=2) is equivalent to svm-train -d 2 -W 2. See command-args for details.

do.select

if not FALSE then the model is selected. This parameter can be used as a list of named arguments to be passed to the select phase

useCells

if TRUE partitions the problem (equivalent to partition_choice=6)

level of display information

Value

a table giving training and validation errors and more internal statistic for every SVM that was trained. This is also recorded in model$train_errors.

Documentation for command-line parameters of svm-train

The following parameters can be used as well:

f=c(<kind>,<number>,[<train_fraction>],[<neg_fraction>])
Selects the fold generation method and the number of folds. If <train_fraction> < 1.0, then the folds for training are generated from a subset with the specified size and the remaining samples are used for validation. Meaning of specific values: <kind> = 1 => each fold is a contiguous block <kind> = 2 => alternating fold assignmend <kind> = 3 => random <kind> = 4 => stratified random <kind> = 5 => random respecting group information of samples <kind> = 6 => random subset (<train_fraction> and <neg_fraction> required)
Allowed values: <kind>: integer between 1 and 6 <number>: integer >= 1 <train_fraction>: float > 0.0 and <= 1.0 <neg_fraction>: float > 0.0 and < 1.0
Default values: <kind> = 3 <number> = 5 <train_fraction> = 1.00
g=c(<size>,<min_gamma>,<max_gamma>,[<scale>])
g=<gamma_list>
The first variant sets the size <size> of the gamma grid and its endpoints <min_gamma> and <max_gamma>. The second variant uses <gamma_list> for the gamma grid.
Meaning of specific values: <scale> Flag indicating whether <min_gamma> and <max_gamma> are scaled based on the sample size, the dimension, and the diameter.
Allowed values: <size>: integer >= 1 <min_gamma>: float > 0.0 <max_gamma>: float > 0.0 <scale>: bool
Default values: <size> = 10 <min_gamma> = 0.200 <max_gamma> = 5.000 <scale> = 1
GPU=c(<use_gpus>,[<GPU_offset>])
Flag controlling whether the GPU support is used. If <use_gpus> = 1, then each CPU thread gets a thread on a GPU. In the case of multiple GPUs, these threads are uniformly distributed among the available GPUs. The optional <GPU_offset> is added to the CPU thread number before the GPU is added before distributing the threads to the GPUs. This makes it possible to avoid that two or more independent processes use the same GPU, if more than one GPU is available.
Allowed values: <use_gpus>: bool <use_offset>: non-negative integer.
Default values: <gpus> = 0 <gpu_offset> = 0
Unfortunately, this option is not activated for the binaries you are currently using. Install CUDA and recompile to activate this option.
h=[<level>]
Displays all help messages.
Meaning of specific values: <level> = 0 => short help messages <level> = 1 => detailed help messages
Allowed values: <level>: 0 or 1
Default values: <level> = 0
i=c(<cold>,<warm>)
Selects the cold and warm start initialization methods of the solver. In general, this option should only be used in particular situations such as the implementation and testing of a new solver or when using the kernel cache.
Meaning of specific values: For values between 0 and 6, both <cold> and <warm> have the same meaning as in Steinwart et al, 'Training SVMs without offset', JMLR 2011. These are: 0 Sets all coefficients to zero. 1 Sets all coefficients to C. 2 Uses the coefficients of the previous solution. 3 Multiplies all coefficients by C_new/C_old. 4 Multiplies all unbounded SVs by C_new/C_old. 5 Multiplies all coefficients by C_old/C_new. 6 Multiplies all unbounded SVs by C_old/C_new.
Allowed values: Depends on the solver, but the range of <cold> is always a subset of the range of <warm>.
Default values: Depending on the solver, the (hopefully) most efficient method is chosen.
k=c(<type>,[aux-file],[<Tr_mm_Pr>,[<size_P>],<Tr_mm>,[<size>],<Va_mm_Pr>,<Va_mm>])
Selects the type of kernel and optionally the memory model for the kernel matrices.
Meaning of specific values: <type> = 0 => Gaussian RBF <type> = 1 => Poisson <type> = 3 => Experimental hierarchical Gauss kernel <aux_file> => Name of the file that contains additional information for the hierarchical Gauss kernel. Only this kernel type requires this option. <X_mm_Y> = 0 => not contiguously stored matrix <X_mm_Y> = 1 => contiguously stored matrix <X_mm_Y> = 2 => cached matrix <X_mm_Y> = 3 => no matrix stored <size_Y> => size of kernel cache in MB Here, X=Tr stands for the training matrix and X=Va for the validation matrix. In both cases, Y=Pr stands for the pre-kernel matrix, which stores the distances between the samples. If <Tr_mm_Pr> is set, then the other three flags <X_mm_Y> need to be set, too. The values <sizeY> must only be set if a cache is chosen. NOTICE: Not all possible combinations are allowed.
Allowed values: <type>: integer between 0 and 3 <X_mm_Y>: integer between 0 and 3 <size_Y>: integer not smaller than 1
Default values: <type> = 0 <X_mm_Y> = 1 <size_Y> = 1024 <size> = 512
l=c(<size>,<min_lambda>,<max_lambda>,[<scale>])
l=c(<lambda_list>,[<interpret_as_C>])
The first variant sets the size <size> of the lambda grid and its endpoints <min_lambda> and <max_lambda>. The second variant uses <lambda_list>, after ordering, for the lambda grid.
Meaning of specific values: <scale> Flag indicating whether <min_lambda> is internally devided by the average number of samples per fold. <interpret_as_C> Flag indicating whether the lambda list should be interpreted as a list of C values
Allowed values: <size>: integer >= 1 <min_lambda>: float > 0.0 <max_lambda>: float > 0.0 <scale>: bool <interpret_as_C>: bool
Default values: <size> = 10 <min_lambda> = 0.001 <max_lambda> = 0.100 <scale> = 1 <scale> = 0
L=c(<loss>,[<clipp>],[<neg_weight>,<pos_weight>])
Sets the loss that is used to compute empirical errors. The optional <clipp> value specifies where the predictions are clipped during validation. The optional weights can only be set if <loss> specifies a loss that has weights.
Meaning of specific values: <loss> = 0 => binary classification loss <loss> = 2 => least squares loss <loss> = 3 => weighted least squares loss <loss> = 4 => pinball loss <loss> = 5 => hinge loss <loss> = 6 => your own template loss <clipp> = -1.0 => clipp at smallest possible value (depends on labels) <clipp> = 0.0 => no clipping is applied
Allowed values: <loss>: values listed above <neg_weight>: float >= -1.0 <neg_weight>: float > 0.0 <pos_weight>: float > 0.0
Default values: <loss> = native loss of solver chosen by option -S <clipp> = -1.000 <neg_weight> = <weight1> set by option -W <pos_weight> = <weight2> set by option -W
P=c(1,[<size>])
P=c(2,[<number>])
P=c(3,[<radius>],[<subset_size>])
P=c(4,[<size>],[<reduce>],[<subset_size>])
P=c(5,[<size>],[<ignore_fraction>],[<subset_size>],[<covers>])
P=c(6,[<size>],[<reduce>],[<subset_size>],[<covers>],[<shrink_factor>]) [<max_width>] [<max_depth>]
Selects the working set partition method.
Meaning of specific values: <type> = 0 => do not split the working sets <type> = 1 => split the working sets in random chunks using maximum <size> of each chunk. Default values are: <size> = 2000 <type> = 2 => split the working sets in random chunks using <number> of chunks. Default values are: <size> = 10 <type> = 3 => split the working sets into Voronoi subsets of radius <radius>. If [subset_size] is set, a subset of this size is used to faster create the Voronoi partition. If subset_size == 0, the entire data set is used, otherwise, the radius is only approximately ensured. Default values are: <radius> = 1.000 <subset_size> = 0 <type> = 4 => split the working sets into Voronoi subsets of maximal size <size>. The optional flag <reduce> controls whether a heuristic to reduce the number of cells is used. If [subset_size] is set, a subset of this size is used to faster create the Voronoi partition. If subset_size == 0, the entire data set is used, otherwise, the maximal size is only approximately ensured. Default values are: <size> = 2000 <reduce> = 1 <subset_size> = 50000 <type> = 5 => devide the working sets into overlapping regions of maximal size <size>. The process of creating regions is stopped when <size> * <ignore_fraction> samples have not been assigned to a region. These samples will then be assigned to the closest region. If <subset_size> is set, a subset of this size is used to find the regions. If subset_size == 0, the entire data set is used. Finally, <covers> controls the number of times the process of finding regions is repeated. Default values are:. <size> = 2000 <ignore_fraction> = 0.5 <subset_size> = 50000 <covers> = 1 <type> = 6 => split the working sets into Voronoi subsets of maximal size <size>. The optional flag <reduce> controls whether a heuristic to reduce the number of cells is used. If [subset_size] is set, a subset of this size is used to faster create the Voronoi partition. If subset_size == 0, the entire data set is used, otherwise, the maximal size is only approximately ensured. Unlike for <type> = 4, the centers for the Voronoi partition are found by a recursive tree approach, which in many cases may be faster. <shrink_factor> describes by which factor the number of samples should at least be decreased. The recursion is stoppend when either <max_width> * <size> is greater than the current working subset or the <max_tree_depth> is reached. For both parameters, a value of 0 means that the corresponding condition above is ignored. Default values (so far, they are only a brave guess) are: <size> = 2000 <reduce> = 1 <subset_size> = 50000 <shrink_factor> = 2.0000 <max_width> = 20 <max_tree_depth> = 4
Allowed values: <type>: integer between 0 and 6 <size>: positive integer <number>: positive integer <radius>: positive real <subset_size>: non-negative integer <reduce>: bool <covers>: positive integer <shrink_factor>: real > 1.0 <max_width>: non-negative integer <max_tree_depth>: non-negative integer
Default values: <type> = 0
r=<seed>
Initializes the random number generator with <seed>.
Meaning of specific values: <seed> = -1 => a random seed based on the internal timer is used
Allowed values: <seed>: integer between -1 and 2147483647
Default values: <seed> = -1
s=c(<clipp>,[<stop_eps>])
Sets the value at which the loss is clipped in the solver to <value>. The optional parameter <stop_eps> sets the threshold in the stopping criterion of the solver.
Meaning of specific values: <clipp> = -1.0 => Depending on the solver type clipp either at the smallest possible value (depends on labels), or do not clipp. <clipp> = 0.0 => no clipping is applied
Allowed values: <clipp>: -1.0 or float >= 0.0. In addition, if <clipp> > 0.0, then <clipp> must not be smaller than the largest absolute value of the samples. <stop_eps>: float > 0.0
Default values: <clipp> = -1.0 <stop_eps> = 0.0010
S=c(<solver>,[<NNs>])
Selects the SVM solver <solver> and the number <NNs> of nearest neighbors used in the working set selection strategy (2D-solvers only).
Meaning of specific values: <solver> = 0 => kernel rule for classification <solver> = 1 => LS-SVM with 2D-solver <solver> = 2 => HINGE-SVM with 2D-solver <solver> = 3 => QUANTILE-SVM with 2D-solver <solver> = 4 => EXPECTILE-SVM with 2D-solver <solver> = 5 => Your SVM solver implemented in template_svm.*
Allowed values: <solver>: integer between 0 and 5 <NNs>: integer between 0 and 100
Default values: <solver> = 2 <NNs> = depends on the solver
T=c(<threads>,[<thread_id_offset>])
Sets the number of threads that are going to be used. Each thread is assigned to a logical processor on the system, so that the number of allowed threads is bounded by the number of logical processors. On systems with activated hyperthreading each physical core runs one thread, if <threads> does not exceed the number of physical cores. Since hyper- threads on the same core share resources, using more threads than cores does usually not increase the performance significantly, and may even decrease it. The optional <thread_id_offset> is added before distributing the threads to the cores. This makes it possible to avoid that two or more independent processes use the same physical cores. Example: To run 2 processes with 3 threads each on a 6-core system call the first process with -T 3 0 and the second one with -T 3 3 .
Meaning of specific values: <threads> = 0 => 4 threads are used (all physical cores run one thread) <threads> = -1 => 3 threads are used (all but one of the physical cores run one thread)
Allowed values: <threads>: integer between -1 and 4 <thread_id_offset>: integer between 0 and 4
Default values: <threads> = 0 <thread_id_offset> = 0
w=c(<neg_weight>,<pos_weight>)
w=c(<min_weight>,<max_weight>,<size>,[<geometric>,<swap>])
w=c(<weight_list>,[<swap>])
Sets values for the weights, solvers should be trained with. For solvers that do not have weights this option is ignored. The first variants sets a pair of values. The second variant computes a sequence of weights of length <size>. The third variant takes the list of weights.
Meaning of specific values: size> = 1 => <weight1> is the negative weight and <weight2> is the positive weight. <size> > 1 => <size> many pairs are computed, where the positive weights are between <min_weight> and <max_weight> and the negative weights are 1 - pos_weight. <geometric> Flag indicating whether the intermediate positive weights are geometrically or arithmetically distributed. <swap> Flag indicating whether the role of the positive and negative weights are interchanged.
Allowed values: <... weight ...>: float > 0.0 and < 1.0 <size>: integer > 0 <geometric>: bool <swap>: bool
Default values: <weight1> = 1.0 <weight2> = 1.0 <size> = 1 <geometric> = 0 <swap> = 0
W=<type>
Selects the working set selection method.
Meaning of specific values: <type> = 0 => take the entire data set <type> = 1 => multiclass 'all versus all' <type> = 2 => multiclass 'one versus all' <type> = 3 => bootstrap with <number> resamples of size <size>
Allowed values: <type>: integer between 0 and 3
Default values: <type> = 0

Details

SVMs are solved for all tasks/cells/folds and entries in the hyper-parameter grid and can afterwards be selected using selectSVMs. A model even can be retrained using other parameters, reusing the training data. The training phase is usually the most time-consuming phase, and therefore for bigger problems it is recommended to use display=1 to get some progress information.

See command-args for details.