# h2o.deeplearning

##### Build a Deep Neural Network model using CPUs Builds a feed-forward multilayer artificial neural network on an H2OFrame

Build a Deep Neural Network model using CPUs Builds a feed-forward multilayer artificial neural network on an H2OFrame

##### Usage

```
h2o.deeplearning(x, y, training_frame, model_id = NULL,
validation_frame = NULL, nfolds = 0,
keep_cross_validation_predictions = FALSE,
keep_cross_validation_fold_assignment = FALSE, fold_assignment = c("AUTO",
"Random", "Modulo", "Stratified"), fold_column = NULL,
ignore_const_cols = TRUE, score_each_iteration = FALSE,
weights_column = NULL, offset_column = NULL, balance_classes = FALSE,
class_sampling_factors = NULL, max_after_balance_size = 5,
max_hit_ratio_k = 0, checkpoint = NULL, pretrained_autoencoder = NULL,
overwrite_with_best_model = TRUE, use_all_factor_levels = TRUE,
standardize = TRUE, activation = c("Tanh", "TanhWithDropout", "Rectifier",
"RectifierWithDropout", "Maxout", "MaxoutWithDropout"), hidden = c(200,
200), epochs = 10, train_samples_per_iteration = -2,
target_ratio_comm_to_comp = 0.05, seed = -1, adaptive_rate = TRUE,
rho = 0.99, epsilon = 1e-08, rate = 0.005, rate_annealing = 1e-06,
rate_decay = 1, momentum_start = 0, momentum_ramp = 1e+06,
momentum_stable = 0, nesterov_accelerated_gradient = TRUE,
input_dropout_ratio = 0, hidden_dropout_ratios = NULL, l1 = 0, l2 = 0,
max_w2 = 3.4028235e+38, initial_weight_distribution = c("UniformAdaptive",
"Uniform", "Normal"), initial_weight_scale = 1, initial_weights = NULL,
initial_biases = NULL, loss = c("Automatic", "CrossEntropy", "Quadratic",
"Huber", "Absolute", "Quantile"), distribution = c("AUTO", "bernoulli",
"multinomial", "gaussian", "poisson", "gamma", "tweedie", "laplace",
"quantile", "huber"), quantile_alpha = 0.5, tweedie_power = 1.5,
huber_alpha = 0.9, score_interval = 5, score_training_samples = 10000,
score_validation_samples = 0, score_duty_cycle = 0.1,
classification_stop = 0, regression_stop = 1e-06, stopping_rounds = 5,
stopping_metric = c("AUTO", "deviance", "logloss", "MSE", "RMSE", "MAE",
"RMSLE", "AUC", "lift_top_group", "misclassification",
"mean_per_class_error"), stopping_tolerance = 0, max_runtime_secs = 0,
score_validation_sampling = c("Uniform", "Stratified"),
diagnostics = TRUE, fast_mode = TRUE, force_load_balance = TRUE,
variable_importances = TRUE, replicate_training_data = TRUE,
single_node_mode = FALSE, shuffle_training_data = FALSE,
missing_values_handling = c("MeanImputation", "Skip"), quiet_mode = FALSE,
autoencoder = FALSE, sparse = FALSE, col_major = FALSE,
average_activation = 0, sparsity_beta = 0,
max_categorical_features = 2147483647, reproducible = FALSE,
export_weights_and_biases = FALSE, mini_batch_size = 1,
categorical_encoding = c("AUTO", "Enum", "OneHotInternal", "OneHotExplicit",
"Binary", "Eigen", "LabelEncoder", "SortByResponse", "EnumLimited"),
elastic_averaging = FALSE, elastic_averaging_moving_rate = 0.9,
elastic_averaging_regularization = 0.001)
```

##### Arguments

- x
A vector containing the names or indices of the predictor variables to use in building the model. If x is missing,then all columns except y are used.

- y
The name of the response variable in the model.If the data does not contain a header, this is the first column index, and increasing from left to right. (The response must be either an integer or a categorical variable).

- training_frame
Id of the training data frame (Not required, to allow initial validation of model parameters).

- model_id
Destination id for this model; auto-generated if not specified.

- validation_frame
Id of the validation data frame.

- nfolds
Number of folds for N-fold cross-validation (0 to disable or >= 2). Defaults to 0.

- keep_cross_validation_predictions
`Logical`

. Whether to keep the predictions of the cross-validation models. Defaults to FALSE.- keep_cross_validation_fold_assignment
`Logical`

. Whether to keep the cross-validation fold assignment. Defaults to FALSE.- fold_assignment
Cross-validation fold assignment scheme, if fold_column is not specified. The 'Stratified' option will stratify the folds based on the response variable, for classification problems. Must be one of: "AUTO", "Random", "Modulo", "Stratified". Defaults to AUTO.

- fold_column
Column with cross-validation fold index assignment per observation.

- ignore_const_cols
`Logical`

. Ignore constant columns. Defaults to TRUE.- score_each_iteration
`Logical`

. Whether to score during each iteration of model training. Defaults to FALSE.- weights_column
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed.

- offset_column
Offset column. This will be added to the combination of columns before applying the link function.

- balance_classes
`Logical`

. Balance training data class counts via over/under-sampling (for imbalanced data). Defaults to FALSE.- class_sampling_factors
Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes.

- max_after_balance_size
Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes. Defaults to 5.0.

- max_hit_ratio_k
Max. number (top K) of predictions to use for hit ratio computation (for multi-class only, 0 to disable). Defaults to 0.

- checkpoint
Model checkpoint to resume training with.

- pretrained_autoencoder
Pretrained autoencoder model to initialize this model with.

- overwrite_with_best_model
`Logical`

. If enabled, override the final model with the best model found during training. Defaults to TRUE.- use_all_factor_levels
`Logical`

. Use all factor levels of categorical variables. Otherwise, the first factor level is omitted (without loss of accuracy). Useful for variable importances and auto-enabled for autoencoder. Defaults to TRUE.- standardize
`Logical`

. If enabled, automatically standardize the data. If disabled, the user must provide properly scaled input data. Defaults to TRUE.- activation
Activation function. Must be one of: "Tanh", "TanhWithDropout", "Rectifier", "RectifierWithDropout", "Maxout", "MaxoutWithDropout". Defaults to Rectifier.

- hidden
Hidden layer sizes (e.g. [100, 100]). Defaults to [200, 200].

- epochs
How many times the dataset should be iterated (streamed), can be fractional. Defaults to 10.

- train_samples_per_iteration
Number of training samples (globally) per MapReduce iteration. Special values are 0: one epoch, -1: all available data (e.g., replicated training data), -2: automatic. Defaults to -2.

- target_ratio_comm_to_comp
Target ratio of communication overhead to computation. Only for multi-node operation and train_samples_per_iteration = -2 (auto-tuning). Defaults to 0.05.

- seed
Seed for random numbers (affects certain parts of the algo that are stochastic and those might or might not be enabled by default) Note: only reproducible when running single threaded. Defaults to -1 (time-based random number).

- adaptive_rate
`Logical`

. Adaptive learning rate. Defaults to TRUE.- rho
Adaptive learning rate time decay factor (similarity to prior updates). Defaults to 0.99.

- epsilon
Adaptive learning rate smoothing factor (to avoid divisions by zero and allow progress). Defaults to 1e-08.

- rate
Learning rate (higher => less stable, lower => slower convergence). Defaults to 0.005.

- rate_annealing
Learning rate annealing: rate / (1 + rate_annealing * samples). Defaults to 1e-06.

- rate_decay
Learning rate decay factor between layers (N-th layer: rate * rate_decay ^ (n - 1). Defaults to 1.

- momentum_start
Initial momentum at the beginning of training (try 0.5). Defaults to 0.

- momentum_ramp
Number of training samples for which momentum increases. Defaults to 1000000.

- momentum_stable
Final momentum after the ramp is over (try 0.99). Defaults to 0.

- nesterov_accelerated_gradient
`Logical`

. Use Nesterov accelerated gradient (recommended). Defaults to TRUE.- input_dropout_ratio
Input layer dropout ratio (can improve generalization, try 0.1 or 0.2). Defaults to 0.

- hidden_dropout_ratios
Hidden layer dropout ratios (can improve generalization), specify one value per hidden layer, defaults to 0.5.

- l1
L1 regularization (can add stability and improve generalization, causes many weights to become 0). Defaults to 0.

- l2
L2 regularization (can add stability and improve generalization, causes many weights to be small. Defaults to 0.

- max_w2
Constraint for squared sum of incoming weights per unit (e.g. for Rectifier). Defaults to 3.4028235e+38.

- initial_weight_distribution
Initial weight distribution. Must be one of: "UniformAdaptive", "Uniform", "Normal". Defaults to UniformAdaptive.

- initial_weight_scale
Uniform: -value...value, Normal: stddev. Defaults to 1.

- initial_weights
A list of H2OFrame ids to initialize the weight matrices of this model with.

- initial_biases
A list of H2OFrame ids to initialize the bias vectors of this model with.

- loss
Loss function. Must be one of: "Automatic", "CrossEntropy", "Quadratic", "Huber", "Absolute", "Quantile". Defaults to Automatic.

- distribution
Distribution function Must be one of: "AUTO", "bernoulli", "multinomial", "gaussian", "poisson", "gamma", "tweedie", "laplace", "quantile", "huber". Defaults to AUTO.

- quantile_alpha
Desired quantile for Quantile regression, must be between 0 and 1. Defaults to 0.5.

- tweedie_power
Tweedie power for Tweedie regression, must be between 1 and 2. Defaults to 1.5.

- huber_alpha
Desired quantile for Huber/M-regression (threshold between quadratic and linear loss, must be between 0 and 1). Defaults to 0.9.

- score_interval
Shortest time interval (in seconds) between model scoring. Defaults to 5.

- score_training_samples
Number of training set samples for scoring (0 for all). Defaults to 10000.

- score_validation_samples
Number of validation set samples for scoring (0 for all). Defaults to 0.

- score_duty_cycle
Maximum duty cycle fraction for scoring (lower: more training, higher: more scoring). Defaults to 0.1.

- classification_stop
Stopping criterion for classification error fraction on training data (-1 to disable). Defaults to 0.

- regression_stop
Stopping criterion for regression error (MSE) on training data (-1 to disable). Defaults to 1e-06.

- stopping_rounds
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable) Defaults to 5.

- stopping_metric
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression) Must be one of: "AUTO", "deviance", "logloss", "MSE", "RMSE", "MAE", "RMSLE", "AUC", "lift_top_group", "misclassification", "mean_per_class_error". Defaults to AUTO.

- stopping_tolerance
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much) Defaults to 0.

- max_runtime_secs
Maximum allowed runtime in seconds for model training. Use 0 to disable. Defaults to 0.

- score_validation_sampling
Method used to sample validation dataset for scoring. Must be one of: "Uniform", "Stratified". Defaults to Uniform.

- diagnostics
`Logical`

. Enable diagnostics for hidden layers. Defaults to TRUE.- fast_mode
`Logical`

. Enable fast mode (minor approximation in back-propagation). Defaults to TRUE.- force_load_balance
`Logical`

. Force extra load balancing to increase training speed for small datasets (to keep all cores busy). Defaults to TRUE.- variable_importances
`Logical`

. Compute variable importances for input features (Gedeon method) - can be slow for large networks. Defaults to TRUE.- replicate_training_data
`Logical`

. Replicate the entire training dataset onto every node for faster training on small datasets. Defaults to TRUE.- single_node_mode
`Logical`

. Run on a single node for fine-tuning of model parameters. Defaults to FALSE.- shuffle_training_data
`Logical`

. Enable shuffling of training data (recommended if training data is replicated and train_samples_per_iteration is close to #nodes x #rows, of if using balance_classes). Defaults to FALSE.- missing_values_handling
Handling of missing values. Either MeanImputation or Skip. Must be one of: "MeanImputation", "Skip". Defaults to MeanImputation.

- quiet_mode
`Logical`

. Enable quiet mode for less output to standard output. Defaults to FALSE.- autoencoder
`Logical`

. Auto-Encoder. Defaults to FALSE.- sparse
`Logical`

. Sparse data handling (more efficient for data with lots of 0 values). Defaults to FALSE.- col_major
`Logical`

. #DEPRECATED Use a column major weight matrix for input layer. Can speed up forward propagation, but might slow down backpropagation. Defaults to FALSE.- average_activation
Average activation for sparse auto-encoder. #Experimental Defaults to 0.

- sparsity_beta
Sparsity regularization. #Experimental Defaults to 0.

- max_categorical_features
Max. number of categorical features, enforced via hashing. #Experimental Defaults to 2147483647.

- reproducible
`Logical`

. Force reproducibility on small data (will be slow - only uses 1 thread). Defaults to FALSE.- export_weights_and_biases
`Logical`

. Whether to export Neural Network weights and biases to H2O Frames. Defaults to FALSE.- mini_batch_size
Mini-batch size (smaller leads to better fit, larger can speed up and generalize better). Defaults to 1.

- categorical_encoding
Encoding scheme for categorical features Must be one of: "AUTO", "Enum", "OneHotInternal", "OneHotExplicit", "Binary", "Eigen", "LabelEncoder", "SortByResponse", "EnumLimited". Defaults to AUTO.

- elastic_averaging
`Logical`

. Elastic averaging between compute nodes can improve distributed model convergence. #Experimental Defaults to FALSE.- elastic_averaging_moving_rate
Elastic averaging moving rate (only if elastic averaging is enabled). Defaults to 0.9.

- elastic_averaging_regularization
Elastic averaging regularization strength (only if elastic averaging is enabled). Defaults to 0.001.

##### See Also

`predict.H2OModel`

for prediction

##### Examples

```
library(h2o)
h2o.init()
iris.hex <- as.h2o(iris)
iris.dl <- h2o.deeplearning(x = 1:4, y = 5, training_frame = iris.hex)
# now make a prediction
predictions <- h2o.predict(iris.dl, iris.hex)
```

*Documentation reproduced from package h2o, version 3.10.5.3, License: Apache License (== 2.0)*