h2o (version 3.10.3.6)

h2o.deepwater: Build a Deep Learning model using multiple native GPU backends

Description

Build a Deep Learning model using multiple native GPU backends Builds a deep neural network on an H2OFrame containing various data sources

Usage

h2o.deepwater(x, y, training_frame, model_id = NULL, checkpoint = NULL,
  autoencoder = FALSE, validation_frame = NULL, nfolds = 0,
  balance_classes = FALSE, max_after_balance_size = 5,
  class_sampling_factors = NULL, keep_cross_validation_predictions = FALSE,
  keep_cross_validation_fold_assignment = FALSE, fold_assignment = c("AUTO",
  "Random", "Modulo", "Stratified"), fold_column = NULL,
  offset_column = NULL, weights_column = NULL,
  score_each_iteration = FALSE, categorical_encoding = c("AUTO", "Enum",
  "OneHotInternal", "OneHotExplicit", "Binary", "Eigen"),
  overwrite_with_best_model = TRUE, epochs = 10,
  train_samples_per_iteration = -2, target_ratio_comm_to_comp = 0.05,
  seed = -1, standardize = TRUE, learning_rate = 0.005,
  learning_rate_annealing = 1e-06, momentum_start = 0.9,
  momentum_ramp = 10000, momentum_stable = 0.99, distribution = c("AUTO",
  "bernoulli", "multinomial", "gaussian", "poisson", "gamma", "tweedie",
  "laplace", "quantile", "huber"), score_interval = 5,
  score_training_samples = 10000, score_validation_samples = 0,
  score_duty_cycle = 0.1, classification_stop = 0, regression_stop = 0,
  stopping_rounds = 5, stopping_metric = c("AUTO", "deviance", "logloss",
  "MSE", "RMSE", "MAE", "RMSLE", "AUC", "lift_top_group", "misclassification",
  "mean_per_class_error"), stopping_tolerance = 0, max_runtime_secs = 0,
  ignore_const_cols = TRUE, shuffle_training_data = TRUE,
  mini_batch_size = 32, clip_gradient = 10, network = c("auto", "user",
  "lenet", "alexnet", "vgg", "googlenet", "inception_bn", "resnet"),
  backend = c("mxnet", "caffe", "tensorflow"), image_shape = c(0, 0),
  channels = 3, sparse = FALSE, gpu = TRUE, device_id = c(0),
  network_definition_file = NULL, network_parameters_file = NULL,
  mean_image_file = NULL, export_native_parameters_prefix = NULL,
  activation = c("Rectifier", "Tanh"), hidden = NULL,
  input_dropout_ratio = 0, hidden_dropout_ratios = NULL,
  problem_type = c("auto", "image", "dataset"))

Arguments

x
A vector containing the names or indices of the predictor variables to use in building the model. If x is missing,then all columns except y are used.
y
The name of the response variable in the model.If the data does not contain a header, this is the column index number starting at 0, and increasing from left to right. (The response must be either an integer or a categorical variable).
training_frame
Id of the training data frame (Not required, to allow initial validation of model parameters).
model_id
Destination id for this model; auto-generated if not specified.
checkpoint
Model checkpoint to resume training with.
autoencoder
Logical. Auto-Encoder. Defaults to FALSE.
validation_frame
Id of the validation data frame.
nfolds
Number of folds for N-fold cross-validation (0 to disable or >= 2). Defaults to 0.
balance_classes
Logical. Balance training data class counts via over/under-sampling (for imbalanced data). Defaults to FALSE.
max_after_balance_size
Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes. Defaults to 5.0.
class_sampling_factors
Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes.
keep_cross_validation_predictions
Logical. Whether to keep the predictions of the cross-validation models. Defaults to FALSE.
keep_cross_validation_fold_assignment
Logical. Whether to keep the cross-validation fold assignment. Defaults to FALSE.
fold_assignment
Cross-validation fold assignment scheme, if fold_column is not specified. The 'Stratified' option will stratify the folds based on the response variable, for classification problems. Must be one of: "AUTO", "Random", "Modulo", "Stratified". Defaults to AUTO.
fold_column
Column with cross-validation fold index assignment per observation.
offset_column
Offset column. This will be added to the combination of columns before applying the link function.
weights_column
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed.
score_each_iteration
Logical. Whether to score during each iteration of model training. Defaults to FALSE.
categorical_encoding
Encoding scheme for categorical features Must be one of: "AUTO", "Enum", "OneHotInternal", "OneHotExplicit", "Binary", "Eigen". Defaults to AUTO.
overwrite_with_best_model
Logical. If enabled, override the final model with the best model found during training. Defaults to TRUE.
epochs
How many times the dataset should be iterated (streamed), can be fractional. Defaults to 10.
train_samples_per_iteration
Number of training samples (globally) per MapReduce iteration. Special values are 0: one epoch, -1: all available data (e.g., replicated training data), -2: automatic. Defaults to -2.
target_ratio_comm_to_comp
Target ratio of communication overhead to computation. Only for multi-node operation and train_samples_per_iteration = -2 (auto-tuning). Defaults to 0.05.
seed
Seed for random numbers (affects certain parts of the algo that are stochastic and those might or might not be enabled by default) Note: only reproducible when running single threaded. Defaults to -1 (time-based random number).
standardize
Logical. If enabled, automatically standardize the data. If disabled, the user must provide properly scaled input data. Defaults to TRUE.
learning_rate
Learning rate (higher => less stable, lower => slower convergence). Defaults to 0.005.
learning_rate_annealing
Learning rate annealing: rate / (1 + rate_annealing * samples). Defaults to 1e-06.
momentum_start
Initial momentum at the beginning of training (try 0.5). Defaults to 0.9.
momentum_ramp
Number of training samples for which momentum increases. Defaults to 10000.
momentum_stable
Final momentum after the ramp is over (try 0.99). Defaults to 0.99.
distribution
Distribution function Must be one of: "AUTO", "bernoulli", "multinomial", "gaussian", "poisson", "gamma", "tweedie", "laplace", "quantile", "huber". Defaults to AUTO.
score_interval
Shortest time interval (in seconds) between model scoring. Defaults to 5.
score_training_samples
Number of training set samples for scoring (0 for all). Defaults to 10000.
score_validation_samples
Number of validation set samples for scoring (0 for all). Defaults to 0.
score_duty_cycle
Maximum duty cycle fraction for scoring (lower: more training, higher: more scoring). Defaults to 0.1.
classification_stop
Stopping criterion for classification error fraction on training data (-1 to disable). Defaults to 0.
regression_stop
Stopping criterion for regression error (MSE) on training data (-1 to disable). Defaults to 0.
stopping_rounds
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable) Defaults to 5.
stopping_metric
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression) Must be one of: "AUTO", "deviance", "logloss", "MSE", "RMSE", "MAE", "RMSLE", "AUC", "lift_top_group", "misclassification", "mean_per_class_error". Defaults to AUTO.
stopping_tolerance
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much) Defaults to 0.
max_runtime_secs
Maximum allowed runtime in seconds for model training. Use 0 to disable. Defaults to 0.
ignore_const_cols
Logical. Ignore constant columns. Defaults to TRUE.
shuffle_training_data
Logical. Enable global shuffling of training data. Defaults to TRUE.
mini_batch_size
Mini-batch size (smaller leads to better fit, larger can speed up and generalize better). Defaults to 32.
clip_gradient
Clip gradients once their absolute value is larger than this value. Defaults to 10.
network
Network architecture. Must be one of: "auto", "user", "lenet", "alexnet", "vgg", "googlenet", "inception_bn", "resnet". Defaults to auto.
backend
Deep Learning Backend. Must be one of: "mxnet", "caffe", "tensorflow". Defaults to mxnet.
image_shape
Width and height of image. Defaults to [0, 0].
channels
Number of (color) channels. Defaults to 3.
sparse
Logical. Sparse data handling (more efficient for data with lots of 0 values). Defaults to FALSE.
gpu
Logical. Whether to use a GPU (if available). Defaults to TRUE.
device_id
Device IDs (which GPUs to use). Defaults to [0].
network_definition_file
Path of file containing network definition (graph, architecture).
network_parameters_file
Path of file containing network (initial) parameters (weights, biases).
mean_image_file
Path of file containing the mean image data for data normalization.
export_native_parameters_prefix
Path (prefix) where to export the native model parameters after every iteration.
activation
Activation function. Only used if no user-defined network architecture file is provided, and only for problem_type=dataset. Must be one of: "Rectifier", "Tanh".
hidden
Hidden layer sizes (e.g. [200, 200]). Only used if no user-defined network architecture file is provided, and only for problem_type=dataset.
input_dropout_ratio
Input layer dropout ratio (can improve generalization, try 0.1 or 0.2). Defaults to 0.
hidden_dropout_ratios
Hidden layer dropout ratios (can improve generalization), specify one value per hidden layer, defaults to 0.5.
problem_type
Problem type, auto-detected by default. If set to image, the H2OFrame must contain a string column containing the path (URI or URL) to the images in the first column. If set to text, the H2OFrame must contain a string column containing the text in the first column. If set to dataset, Deep Water behaves just like any other H2O Model and builds a model on the provided H2OFrame (non-String columns). Must be one of: "auto", "image", "dataset". Defaults to auto.