h2o (version 3.2.0.3)

h2o.createFrame: Data Frame Creation in H2O

Description

Creates a data frame in H2O with real-valued, categorical, integer, and binary columns specified by the user.

Usage

h2o.createFrame(conn = h2o.getConnection(), key = "", rows = 10000,
  cols = 10, randomize = TRUE, value = 0, real_range = 100,
  categorical_fraction = 0.2, factors = 100, integer_fraction = 0.2,
  integer_range = 100, binary_fraction = 0.1, binary_ones_fraction = 0.02,
  missing_fraction = 0.01, response_factors = 2, has_response = FALSE,
  seed)

Arguments

conn
A H2OConnection object.
key
A string indicating the destination key. If empty, this will be auto-generated by H2O.
rows
The number of rows of data to generate.
cols
The number of columns of data to generate. Excludes the response column if has_response = TRUE.
randomize
A logical value indicating whether data values should be randomly generated. This must be TRUE if either categorical_fraction or integer_fraction is non-zero.
value
If randomize = FALSE, then all real-valued entries will be set to this value.
real_range
The range of randomly generated real values.
categorical_fraction
The fraction of total columns that are categorical.
factors
The number of (unique) factor levels in each categorical column.
integer_fraction
The fraction of total columns that are integer-valued.
integer_range
The range of randomly generated integer values.
binary_fraction
The fraction of total columns that are binary-valued.
binary_ones_fraction
The fraction of values in a binary column that are set to 1.
missing_fraction
The fraction of total entries in the data frame that are set to NA.
response_factors
If has_response = TRUE, then this is the number of factor levels in the response column.
has_response
A logical value indicating whether an additional response column should be pre-pended to the final H2O data frame. If set to TRUE, the total number of columns will be cols+1.
seed
A seed used to generate random values when randomize = TRUE.

Value

  • Returns a H2OFrame object.

Examples

Run this code
library(h2o)
localH2O <- h2o.init()
hex <- h2o.createFrame(localH2O, rows = 1000, cols = 100, categorical_fraction = 0.1,
                       factors = 5, integer_fraction = 0.5, integer_range = 1,
                       has_response = TRUE)
head(hex)
summary(hex)

hex2 <- h2o.createFrame(localH2O, rows = 100, cols = 10, randomize = FALSE, value = 5,
                        categorical_fraction = 0, integer_fraction = 0)
summary(hex2)

Run the code above in your browser using DataCamp Workspace