h2o (version 3.44.0.3)

h2o.createFrame: Data H2OFrame Creation in H2O

Description

Creates a data frame in H2O with real-valued, categorical, integer, and binary columns specified by the user.

Usage

h2o.createFrame(
  rows = 10000,
  cols = 10,
  randomize = TRUE,
  value = 0,
  real_range = 100,
  categorical_fraction = 0.2,
  factors = 100,
  integer_fraction = 0.2,
  integer_range = 100,
  binary_fraction = 0.1,
  binary_ones_fraction = 0.02,
  time_fraction = 0,
  string_fraction = 0,
  missing_fraction = 0.01,
  response_factors = 2,
  has_response = FALSE,
  seed,
  seed_for_column_types
)

Value

Returns an H2OFrame object.

Arguments

rows

The number of rows of data to generate.

cols

The number of columns of data to generate. Excludes the response column if has_response = TRUE.

randomize

A logical value indicating whether data values should be randomly generated. This must be TRUE if either categorical_fraction or integer_fraction is non-zero.

value

If randomize = FALSE, then all real-valued entries will be set to this value.

real_range

The range of randomly generated real values.

categorical_fraction

The fraction of total columns that are categorical.

factors

The number of (unique) factor levels in each categorical column.

integer_fraction

The fraction of total columns that are integer-valued.

integer_range

The range of randomly generated integer values.

binary_fraction

The fraction of total columns that are binary-valued.

binary_ones_fraction

The fraction of values in a binary column that are set to 1.

time_fraction

The fraction of randomly created date/time columns.

string_fraction

The fraction of randomly created string columns.

missing_fraction

The fraction of total entries in the data frame that are set to NA.

response_factors

If has_response = TRUE, then this is the number of factor levels in the response column.

has_response

A logical value indicating whether an additional response column should be pre-pended to the final H2O data frame. If set to TRUE, the total number of columns will be cols+1.

seed

A seed used to generate random values when randomize = TRUE.

seed_for_column_types

A seed used to generate random column types when randomize = TRUE.

Examples

Run this code
if (FALSE) {
library(h2o)
h2o.init()
hf <- h2o.createFrame(rows = 1000, cols = 100, categorical_fraction = 0.1,
                      factors = 5, integer_fraction = 0.5, integer_range = 1,
                      has_response = TRUE)
head(hf)
summary(hf)

hf <- h2o.createFrame(rows = 100, cols = 10, randomize = FALSE, value = 5,
                      categorical_fraction = 0, integer_fraction = 0)
summary(hf)
}

Run the code above in your browser using DataCamp Workspace