h2o.psvm: Trains a Support Vector Machine model on an H2O dataset

Description

Alpha version. Supports only binomial classification problems.

Usage

h2o.psvm(
  x,
  y,
  training_frame,
  model_id = NULL,
  validation_frame = NULL,
  ignore_const_cols = TRUE,
  hyper_param = 1,
  kernel_type = c("gaussian"),
  gamma = -1,
  rank_ratio = -1,
  positive_weight = 1,
  negative_weight = 1,
  disable_training_metrics = TRUE,
  sv_threshold = 1e-04,
  fact_threshold = 1e-05,
  feasible_threshold = 0.001,
  surrogate_gap_threshold = 0.001,
  mu_factor = 10,
  max_iterations = 200,
  seed = -1
)

Arguments

(Optional) A vector containing the names or indices of the predictor variables to use in building the model. If x is missing, then all columns except y are used.

The name or column index of the response variable in the data. The response must be either a binary categorical/factor variable or a numeric variable with values -1/1 (for compatibility with SVMlight format).

training_frame

Id of the training data frame.

model_id

Destination id for this model; auto-generated if not specified.

validation_frame

Id of the validation data frame.

ignore_const_cols

Logical. Ignore constant columns. Defaults to TRUE.

hyper_param

Penalty parameter C of the error term Defaults to 1.

kernel_type

Type of used kernel Must be one of: "gaussian". Defaults to gaussian.

gamma

Coefficient of the kernel (currently RBF gamma for gaussian kernel, -1 means 1/#features) Defaults to -1.

rank_ratio

Desired rank of the ICF matrix expressed as an ration of number of input rows (-1 means use sqrt(#rows)). Defaults to -1.

positive_weight

Weight of positive (+1) class of observations Defaults to 1.

negative_weight

Weight of positive (-1) class of observations Defaults to 1.

disable_training_metrics

Logical. Disable calculating training metrics (expensive on large datasets) Defaults to TRUE.

sv_threshold

Threshold for accepting a candidate observation into the set of support vectors Defaults to 0.0001.

fact_threshold

Convergence threshold of the Incomplete Cholesky Factorization (ICF) Defaults to 1e-05.

feasible_threshold

Convergence threshold for primal-dual residuals in the IPM iteration Defaults to 0.001.

surrogate_gap_threshold

Feasibility criterion of the surrogate duality gap (eta) Defaults to 0.001.

mu_factor

Increasing factor mu Defaults to 10.

max_iterations

Maximum number of iteration of the algorithm Defaults to 200.

seed

Seed for random numbers (affects certain parts of the algo that are stochastic and those might or might not be enabled by default). Defaults to -1 (time-based random number).

Examples

Run this code

# NOT RUN {
library(h2o)
h2o.init()

# Import the splice dataset
f <- "https://s3.amazonaws.com/h2o-public-test-data/smalldata/splice/splice.svm"
splice <- h2o.importFile(f)

# Train the Support Vector Machine model
svm_model <- h2o.psvm(gamma = 0.01, rank_ratio = 0.1,
                      y = "C1", training_frame = splice,
                      disable_training_metrics = FALSE)
# }

Run the code above in your browser using DataLab