h2o (version 3.10.3.6)

h2o.tokenize: Tokenize String

Description

h2o.tokenize is similar to h2o.strsplit, the difference between them is that h2o.tokenize will store the tokenized text into a single column making it easier for additional processing (filtering stop words, word2vec algo, ...).

Usage

h2o.tokenize(x, split)

Arguments

x
The column or columns whose strings to tokenize.
split
The regular expression to split on.

Value

An H2OFrame with a single column representing the tokenized Strings. Original rows of the input DF are separated by NA.