Learn R Programming

h2o (version 2.4.3.11)

h2o.importHDFS: Import from HDFS

Description

Imports a HDFS file or set of files in a directory and parses them, returning a object containing the identifying hex key.

Usage

## Default method:
h2o.importHDFS(object, path, pattern = "", key = "", parse = TRUE, header, 
  sep = "", col.names, version = 2)

## Import to a ValueArray object: h2o.importHDFS.VA(object, path, pattern = "", key = "", parse = TRUE, header, sep = "", col.names)

## Import to a FluidVecs object: h2o.importHDFS.FV(object, path, pattern = "", key = "", parse = TRUE, header, sep = "", col.names)

Arguments

object
An H2OClient object containing the IP address and port of the server running H2O.
path
The path of the file or folder directory to be imported. If it does not contain an absolute path, the file name is relative to the current working directory.
pattern
(Optional) Character string containing a regular expression to match file(s) in the folder.
key
(Optional) The unique hex key assigned to the imported file. If none is given, a key will automatically be generated based on the file path.
parse
(Optional) A logical value indicating whether the file should be parsed after import.
header
(Optional) A logical value indicating whether the first row is the column header. If missing, H2O will automatically try to detect the presence of a header.
sep
(Optional) The field separator character. Values on each line of the file are separated by this character. If sep = "", the parser will automatically detect the separator.
col.names
(Optional) A H2OParsedDataVA (version = 1) or H2OParsedData (version = 2) object containing a single delimited line with the column names for the fil
version
(Optional) If version = 1, the file will be imported to a ValueArray object. Otherwise, if version = 2, the file will be imported as a FluidVecs object.

Value

  • If parse = TRUE, the function returns an object of class H2OParsedDataVA when version = 1 and an object of class H2OParsedData when version = 2. Otherwise, when parse = FALSE, it returns an object of class H2ORawDataVA when version = 1 and an object of class H2ORawData when version = 2.

Details

Calling the method with version = 1 is equivalent to h2o.importHDFS.VA, and version = 2 is equivalent to h2o.importHDFS.FV.

When path is a directory, this method acts like h2o.importFolder and concatenates all data files in the folder into a single ValueArray object.

WARNING: In H2O, import is lazy! Do not modify the data files on hard disk until after parsing is complete.

See Also

h2o.importFile, h2o.importFolder, h2o.importURL, h2o.uploadFile

Examples

Run this code
# This is an example of how to import files from HDFS.
# The user must modify the path to his or her specific HDFS path for this example to run.
library(h2o)
localH2O = h2o.init(ip = "localhost", port = 54321, startH2O = TRUE)
iris.hex = h2o.importHDFS(localH2O, path = paste("hdfs://192.168.1.161", 
  "datasets/runit/iris_wheader.csv", sep = "/"), parse = TRUE)
class(iris.hex)
summary(iris.hex)
iris.fv = h2o.importHDFS(localH2O, path = paste("hdfs://192.168.1.161", 
  "datasets/runit/iris_wheader.csv", sep = "/"), parse = TRUE, version = 2)
class(iris.fv)

iris_folder.hex = h2o.importHDFS(localH2O, path = paste("hdfs://192.168.1.161", 
  "datasets/runit/iris_test_train", sep = "/"))
summary(iris_folder.hex)

Run the code above in your browser using DataLab