h2o.impute: # children <- list(c(paste0(',nAggs), unlist(lapply(aggs, function(l) { .args.to.ast(.args=l)}))))

Description

Basic Imputation of H2O Vectors

Usage

h2o.impute(data, column, method = c("mean", "median", "mode"),
  combine_method = c("interpolate", "average", "lo", "hi"), by = NULL,
  inplace = TRUE)

Arguments

data

The dataset containing the column to impute.

column

The column to impute.

method

"mean" replaces NAs with the column mean; "median" replaces NAs with the column median; "mode" replaces with the most common factor (for factor columns only);

combine_method

If method is "median", then choose how to combine quantiles on even sample sizes. This parameter is ignored in all other cases.

group by columns

inplace

Perform the imputation inplace or make a copy. Default is to perform the imputation in place.

Value

a H2OFrame with imputed values

Details

Perform simple imputation on a single vector by filling missing values with aggregates computed on the "na.rm'd" vector. Additionally, it's possible to perform imputation based on groupings of columns from within data; these columns can be passed by index or name to the by parameter. If a factor column is supplied, then the method must be one "mode". Anything else results in a full stop.

The default method is selected based on the type of the column to impute. If the column is numeric then "mean" is selected; if it is categorical, then "mode" is selected. Otherwise column types (e.g. String, Time, UUID) are not supported.

Examples

Run this code

h2o.init()
 fr <- as.h2o(iris, destination_frame="iris")
 fr[sample(nrow(fr),40),5] <- NA  # randomly replace 50 values with NA
 # impute with a group by
 h2o.impute(fr, "Species", "mode", by=c("Sepal.Length", "Sepal.Width"))

Run the code above in your browser using DataLab