```
## Default method:
h2o.randomForest(x, y, data, classification = TRUE, ntree = 50, depth = 20,
sample.rate = 2/3, classwt = NULL, nbins = 100, seed = -1, importance = FALSE,
validation, nodesize = 1, balance.classes = FALSE, max.after.balance.size = 5,
use_non_local = TRUE, version = 2)
## Import to a ValueArray object:
h2o.randomForest.VA(x, y, data, ntree = 50, depth = 20, sample.rate = 2/3,
classwt = NULL, nbins = 100, seed = -1, use_non_local = TRUE)
## Import to a FluidVecs object:
h2o.randomForest.FV(x, y, data, classification = TRUE, ntree = 50, depth = 20,
sample.rate = 2/3, nbins = 100, seed = -1, importance = FALSE, validation,
nodesize = 1, balance.classes = FALSE, max.after.balance.size = 5)
```

x

A vector containing the names or indices of the predictor variables to use in building the random forest model.

y

The name or index of the response variable. If the data does not contain a header, this is the column index, designated by increasing numbers from left to right. (The response must be either an integer or a categorical variable).

data

An

H2OParsedDataVA

(`version = 1`

) or H2OParsedData

(`version = 2`

) object containing the variables in the model.classification

(Optional) A logical value indicating whether a classification model should be built (as opposed to regression).

ntree

(Optional) Number of trees to grow. (Must be a nonnegative integer).

depth

(Optional) Maximum depth to grow the tree.

sample.rate

(Optional) Sampling rate for constructing data from which individual trees are grown.

classwt

(Optional) Numeric vector of class weights for a categorical response.

nbins

(Optional) Build a histogram of this many bins, then split at best point.

seed

(Optional) Seed for building the random forest. If

`seed = -1`

, one will automatically be generated by H2O.importance

(Optional) A logical value indicating whether to calculate variable importance. Set to

`FALSE`

to speed up computations.validation

(Optional) An

H2OParsedDataVA

(`version = 1`

) or H2OParsedData

(`version = 2`

) object indicating the validation dataset used to construct confusion matrinodesize

(Optional) Number of nodes to use for computation.

balance.classes

(Optional) Balance training data class counts via over/under-sampling (for imbalanced data)

max.after.balance.size

Maximum relative size of the training data after balancing class counts (can be less than 1.0)

use_non_local

(Optional) Logical value indicating whether to use non-local data in building random forest model.

version

(Optional) The version of random forest to run. If

`version = 1`

, this will run the single-node ValueArray implementation, while `version = 2`

selects the distributed, but still beta stage FluidVecs implementation.- An object of class

(H2ORFModelVA `version = 1`

) or

(H2ODRFModel `version = 2`

) with slots key, data, and model, where the last is a list of the following components: ntree Number of trees grown. mse Mean-squared error for each tree. forest A matrix giving the minimum, mean, and maximum of the tree depth and number of leaves. confusion Confusion matrix of the prediction.

`version = 1`

, you must import data to a ValueArray object using `h2o.importFile.VA`

, `h2o.importFolder.VA`

or one of its variants. To run with `version = 2`

, you must import data to a FluidVecs object using `h2o.importFile.FV`

, `h2o.importFolder.FV`

or one of its variants.```
# Run an RF model on iris data
library(h2o)
localH2O = h2o.init(ip = "localhost", port = 54321, startH2O = TRUE)
irisPath = system.file("extdata", "iris.csv", package = "h2o")
iris.hex = h2o.importFile(localH2O, path = irisPath, key = "iris.hex")
h2o.randomForest(y = 5, x = c(2,3,4), data = iris.hex, ntree = 50, depth = 100)
```

Run the code above in your browser using DataCamp Workspace