Learn R Programming

h2o (version 2.8.4.4)

h2o.impute: Impute A Column of Data

Description

Impute a column of data using the mean, median, or mode. Optionally impute based on groupings of additional columns.

Usage

h2o.impute(data, column, method, groupBy)

Arguments

data
An H2OParsedData object
column
The column to be imputed. Must be a single column, but may be an index, the column name, or a quoted column.
method
The method describing how to impute the column, one of "mean", "median", or "mode". If the column is a factor, then "mode" is forced by H2O.
groupBy
If `groupBy` is not NULL, then the missing values are imputed using the mean/median/mode of `column` within the groups formed by the groupBy columns.

Value

  • No return value, but the H2OParsedData object is imputed in place.

Examples

Run this code
library(h2o)
localH2O = h2o.init()

# randomly repalce 50 rows in each column of the iris dataset with NA
ds <- iris
ds[sample(nrow(ds), 50),1] <- NA
ds[sample(nrow(ds), 50),2] <- NA
ds[sample(nrow(ds), 50),3] <- NA
ds[sample(nrow(ds), 50),4] <- NA
ds[sample(nrow(ds), 50),5] <- NA

# upload the NA'ed dataset to H2O
hex <- as.h2o(localH2O, ds)
head(hex)

# impute the numeric column in place with "median"
h2o.impute(hex, .(Sepal.Length), method = "median")

# impute with the mean based on the groupBy columns Sepal.Length and Petal.Width and Species
h2o.impute(hex, 2, method = "mean", groupBy = .(Sepal.Length, Petal.Width, Species))

# impute the Species column with the "mode" based on the columns 1 and 4
h2o.impute(hex, 5, method = "mode", groupBy = c(1,4))

Run the code above in your browser using DataLab