h2o.naiveBayes: Naive Bayes Model in H2O

Description

Compute naive Bayes probabilities on an H2O dataset.

Usage

h2o.naiveBayes(x, y, training_frame, model_id, laplace = 0,
  threshold = 0.001, eps = 0, compute_metrics = TRUE)

Arguments

A vector containing the names or indices of the predictor variables to use in building the model.

The name or index of the response variable. If the data does not contain a header, this is the column index number starting at 0, and increasing from left to right. The response must be a categorical variable with at least two levels.

training_frame

An H2OFrame object containing the variables in the model.

model_id

(Optional) The unique id assigned to the resulting model. If none is given, an id will automatically be generated.

laplace

A positive number controlling Laplace smoothing. The default zero disables smoothing.

threshold

The minimum standard deviation to use for observations without enough data. Must be at least 1e-10.

eps

A threshold cutoff to deal with numeric instability, must be positive.

compute_metrics

A logical value indicating whether model metrics should be computed. Set to FALSE to reduce the runtime of the algorithm.

Value

Returns an object of class H2OBinomialModel if the response has two categorical levels, and H2OMultinomialModel otherwise.

Details

The naive Bayes classifier assumes independence between predictor variables conditional on the response, and a Gaussian distribution of numeric predictors with mean and standard deviation computed from the training dataset. When building a naive Bayes classifier, every row in the training dataset that contains at least one NA will be skipped completely. If the test dataset has missing values, then those predictors are omitted in the probability calculation during prediction.

Examples

Run this code

localH2O <- h2o.init()
 votesPath <- system.file("extdata", "housevotes.csv", package="h2o")
 votes.hex <- h2o.uploadFile(localH2O, path = votesPath, header = TRUE)
 h2o.naiveBayes(x = 2:17, y = 1, training_frame = votes.hex, laplace = 3)

Run the code above in your browser using DataLab