h2o.importFile: Import Files into H2O

Description

Imports files into an H2O cloud. The default behavior is to pass-through to the parse phase automatically.

Usage

h2o.importFolder(path, conn = h2o.getConnection(), pattern = "",
  destination_frame = "", parse = TRUE, header = NA, sep = "",
  col.names = NULL, na.strings = NULL, parse_type = NULL)
h2o.importURL(path, conn = h2o.getConnection(), destination_frame = "",
  parse = TRUE, header = NA, sep = "", col.names = NULL,
  na.strings = NULL, parse_type = NULL)
h2o.importHDFS(path, conn = h2o.getConnection(), pattern = "",
  destination_frame = "", parse = TRUE, header = NA, sep = "",
  col.names = NULL, na.strings = NULL, parse_type = NULL)
h2o.uploadFile(path, conn = h2o.getConnection(), destination_frame = "",
  parse = TRUE, header = NA, sep = "", col.names = NULL,
  col.types = NULL, na.strings = NULL, progressBar = FALSE,
  parse_type = NULL)

Arguments

path

The complete URL or normalized file path of the file to be imported. Each row of data appears as one line of the file.

conn

an H2OConnection class object.

pattern

(Optional) Character string containing a regular expression to match file(s) in the folder.

destination_frame

(Optional) The unique hex key assigned to the imported file. If none is given, a key will automatically be generated based on the URL path.

parse

(Optional) A logical value indicating whether the file should be parsed after import.

header

(Optional) A logical value indicating whether the first line of the file contains column headers. If left empty, the parser will try to automatically detect this.

sep

(Optional) The field separator character. Values on each line of the file are separated by this character. If sep = "", the parser will automatically detect the separator.

col.names

(Optional) A H2ORawData or H2OFrame (version = 2) object containing a single delimited line with the column names for the file.

na.strings

(Optional) H2O will interpret these strings as missing.

parse_type

(Optional) Specify which parser type H2O will use. Valid types are "ARFF", "XLS", "CSV", "SVMLight"

col.types

(Optional) A vector to specify whether columns should be forced to a certain type upon import parsing.

progressBar

(Optional) When FALSE, tell H2O parse call to block synchronously instead of polling. This can be faster for small datasets but loses the progress bar.

Details

Other than h2o.uploadFile, if the given path is relative, then it will be relative to the start location of the H2O instance. Additionally, the file must be on the same machine as the H2O cloud. In the case of h2o.uploadFile, a relative path will resolve relative to the working directory of the current R session.

Import an entire directory of files. If the given path is relative, then it will be relative to the start location of the H2O instance. The default behavior is to pass-through to the parse phase automatically.

h2o.importURL and h2o.importHDFS are both deprecated functions. Instead, use h2o.importFile

Examples

Run this code

localH2O = h2o.init(ip = "localhost", port = 54321, startH2O = TRUE)
prosPath = system.file("extdata", "prostate.csv", package = "h2o")
prostate.hex = h2o.uploadFile(localH2O, path = prosPath, destination_frame = "prostate.hex")
class(prostate.hex)
summary(prostate.hex)

Run the code above in your browser using DataLab