h2o (version 3.10.3.6)

h2o.createFrame: Data H2OFrame Creation in H2O

Description

Creates a data frame in H2O with real-valued, categorical, integer, and binary columns specified by the user.

Usage

h2o.createFrame(rows = 10000, cols = 10, randomize = TRUE, value = 0,
  real_range = 100, categorical_fraction = 0.2, factors = 100,
  integer_fraction = 0.2, integer_range = 100, binary_fraction = 0.1,
  binary_ones_fraction = 0.02, time_fraction = 0, string_fraction = 0,
  missing_fraction = 0.01, response_factors = 2, has_response = FALSE,
  seed, seed_for_column_types)

Arguments

rows
The number of rows of data to generate.
cols
The number of columns of data to generate. Excludes the response column if has_response = TRUE.
randomize
A logical value indicating whether data values should be randomly generated. This must be TRUE if either categorical_fraction or integer_fraction is non-zero.
value
If randomize = FALSE, then all real-valued entries will be set to this value.
real_range
The range of randomly generated real values.
categorical_fraction
The fraction of total columns that are categorical.
factors
The number of (unique) factor levels in each categorical column.
integer_fraction
The fraction of total columns that are integer-valued.
integer_range
The range of randomly generated integer values.
binary_fraction
The fraction of total columns that are binary-valued.
binary_ones_fraction
The fraction of values in a binary column that are set to 1.
time_fraction
The fraction of randomly created date/time columns.
string_fraction
The fraction of randomly created string columns.
missing_fraction
The fraction of total entries in the data frame that are set to NA.
response_factors
If has_response = TRUE, then this is the number of factor levels in the response column.
has_response
A logical value indicating whether an additional response column should be pre-pended to the final H2O data frame. If set to TRUE, the total number of columns will be cols+1.
seed
A seed used to generate random values when randomize = TRUE.
seed_for_column_types
A seed used to generate random column types when randomize = TRUE.

Value

Returns an H2OFrame object.

Examples

Run this code
library(h2o)
h2o.init()
hex <- h2o.createFrame(rows = 1000, cols = 100, categorical_fraction = 0.1,
                       factors = 5, integer_fraction = 0.5, integer_range = 1,
                       has_response = TRUE)
head(hex)
summary(hex)

hex2 <- h2o.createFrame(rows = 100, cols = 10, randomize = FALSE, value = 5,
                        categorical_fraction = 0, integer_fraction = 0)
summary(hex2)

Run the code above in your browser using DataCamp Workspace