hive: Hadoop Interactive Framework Control

Description

High-level functions to control Hadoop framework.

Usage

hive( new )
.hinit( hadoop_home )
hive_start( henv = hive() )
hive_stop( henv = hive() )
hive_is_available( henv = hive() )

Arguments

hadoop_home

A character string pointing to the local Hadoop installation. If not given, then .hinit() will search the default installation directory (given via the environment variable HADOOP_HOME, or /etc/hadoop, respectively).

henv

An object containing the local Hadoop configuration.

new

An object specifying the Hadoop environment.

Value

hive() returns an object of class "hive" representing the currently used cluster configuration.

hive_is_available() returns TRUE if the given Hadoop framework is running.

Details

High-level functions to control Hadoop framework.

The function hive() is used to get/set the Hadoop cluster object. This object consists of an environment holding information about the Hadoop cluster.

The function .hinit() is used to initialize a Hadoop cluster. It retrieves most configuration options via searching the HADOOP_HOME directory given as an environment variable, or, alternatively, by searching the /etc/hadoop directory in case the https://www.cloudera.com distribution (i.e., CDH3) is used.

The functions hive_start() and hive_stop() are used to start/stop the Hadoop framework. The latter is not applicable for system-wide installations like CDH3.

The function hive_is_available() is used to check the status of a Hadoop cluster.

References

Apache Hadoop: https://hadoop.apache.org/.

Cloudera's distribution including Apache Hadoop (CDH): https://www.cloudera.com/downloads/cdh.html.

Examples

Run this code

# NOT RUN {
## read configuration and initialize a Hadoop cluster:
# }
# NOT RUN {
h <- .hinit( "/etc/hadoop" )
# }
# NOT RUN {
hive( h )
# }
# NOT RUN {
## Start hadoop cluster:
# }
# NOT RUN {
hive_start()
# }
# NOT RUN {
## check the status of an Hadoop cluste:
# }
# NOT RUN {
hive_is_available()
# }
# NOT RUN {
## return cluster configuration 'h':
hive()
## Stop hadoop cluster:
# }
# NOT RUN {
hive_stop()
# }

Run the code above in your browser using DataLab