Learn R Programming

hive (version 0.1-8)

hive_stream: Hadoop Streaming with hive

Description

High-level functions for using Hadoop Streaming.

Usage

hive_stream( mapper, reducer, input, output, henv = hive(),
             mapper_args = NULL, reducer_args = NULL, cmdenv_arg = NULL )

Arguments

mapper
a function which is executed on each worker node. The so-called mapper typically maps input key/value pairs to a set of intermediate key/value pairs.
reducer
a function which is executed on each worker node. The so-called reducer reduces a set of intermediate values which share a key to a smaller set of values. If no reducer is used leave empty.
input
specifies the directory holding the data in the DFS.
output
specifies the output directory in the DFS containing the results after the streaming job finished.
henv
Hadoop local environment.
mapper_args
additional arguments to the mapper.
reducer_args
additional arguments to the reducer.
cmdenv_arg
additional arguments passed as environment variables to distributed tasks.

Details

The function hive_stream starts a MapReduce job on the given data located in the DFS.

References

Apache Hadoop core (http://hadoop.apache.org/core/).