
Last chance! 50% off unlimited learning
Sale ends in
Split an existing H2O data set according to user-specified ratios. The number of subsets is always 1 more than the number of given ratios. Note that this does not give an exact split. H2O is designed to be efficient on big data using a probabilistic splitting method rather than an exact split. For example, when specifying a split of 0.75/0.25, H2O will produce a test/train split with an expected value of 0.75/0.25 rather than exactly 0.75/0.25. On small datasets, the sizes of the resulting splits will deviate from the expected value more than on big data, where they will be very close to exact.
h2o.splitFrame(data, ratios = 0.75, destination_frames, seed = -1)
An H2OFrame object representing the dataste to split.
A numeric value or array indicating the ratio of total rows contained in each split. Must total up to less than 1.
An array of frame IDs equal to the number of ratios specified plus one.
Random seed.
Returns a list of split H2OFrame's
# NOT RUN {
library(h2o)
h2o.init()
iris_hf <- as.h2o(iris)
iris_split <- h2o.splitFrame(iris_hf, ratios = c(0.2, 0.5))
head(iris_split[[1]])
summary(iris_split[[1]])
# }
Run the code above in your browser using DataLab