H2oAutoencoder: H2oAutoencoder for anomaly detection and dimensionality reduction

Description

H2oAutoencoder for anomaly detection and or dimensionality reduction

Usage

H2oAutoencoder(
  AnomalyDetection = TRUE,
  DimensionReduction = TRUE,
  data,
  ValidationData = NULL,
  Features = NULL,
  RemoveFeatures = FALSE,
  NThreads = max(1L, parallel::detectCores() - 2L),
  MaxMem = "28G",
  H2oShutdown = TRUE,
  ModelID = "TestModel",
  LayerStructure = NULL,
  ReturnLayer = 4L,
  per_feature = TRUE,
  Activation = "Tanh",
  Epochs = 5L,
  L2 = 0.1,
  ElasticAveraging = TRUE,
  ElasticAveragingMovingRate = 0.9,
  ElasticAveragingRegularization = 0.001
)

Arguments

AnomalyDetection

Set to TRUE to run anomaly detection

DimensionReduction

Set to TRUE to run dimension reduction

data

The data.table with the columns you wish to have analyzed

ValidationData

The data.table with the columns you wish to have scored

Features

NULL Column numbers or column names

RemoveFeatures

Set to TRUE if you want the features you specify in the Features argument to be removed from the data returned

NThreads

max(1L, parallel::detectCores()-2L)

MaxMem

"28G"

H2oShutdown

Setting to TRUE will shutdown H2O when it done being used internally.

ModelID

"TestModel"

LayerStructure

ReturnLayer

Which layer of the NNet to return. Choose from 1-7 with 4 being the layer with the least amount of nodes

per_feature

Set to TRUE to have per feature anomaly detection generated. Otherwise and overall value will be generated

Activation

Choose from "Tanh", "TanhWithDropout", "Rectifier", "RectifierWithDropout","Maxout", "MaxoutWithDropout"

Epochs

Quantile value to find the cutoff value for classifying outliers

Specify the amount of memory to allocate to H2O. E.g. "28G"

ElasticAveraging

Specify the number of threads (E.g. cores * 2)

ElasticAveragingMovingRate

Specify the number of decision trees to build

ElasticAveragingRegularization

Specify the row sample rate per tree

Value

A data.table

Examples

Run this code

# NOT RUN {
# Create simulated data

# Define correlation strength of features to target
Correl <- 0.85

# Number of rows you want returned
N <- 10000

# Create data
data <- data.table::data.table(Adrian = runif(N))
data[, x1 := qnorm(Adrian)]
data[, x2 := runif(N)]
data[, Independent_Variable1 := log(pnorm(Correl * x1 + sqrt(1-Correl^2) * qnorm(x2)))]
data[, Independent_Variable2 := (pnorm(Correl * x1 + sqrt(1-Correl^2) * qnorm(x2)))]
data[, Independent_Variable3 := exp(pnorm(Correl * x1 + sqrt(1-Correl^2) * qnorm(x2)))]
data[, Independent_Variable4 := exp(exp(pnorm(Correl * x1 + sqrt(1-Correl^2) * qnorm(x2))))]
data[, Independent_Variable5 := sqrt(pnorm(Correl * x1 + sqrt(1-Correl^2) * qnorm(x2)))]
data[, Independent_Variable6 := (pnorm(Correl * x1 + sqrt(1-Correl^2) * qnorm(x2)))^0.10]
data[, Independent_Variable7 := (pnorm(Correl * x1 + sqrt(1-Correl^2) * qnorm(x2)))^0.25]
data[, Independent_Variable8 := (pnorm(Correl * x1 + sqrt(1-Correl^2) * qnorm(x2)))^0.75]
data[, Independent_Variable9 := (pnorm(Correl * x1 + sqrt(1-Correl^2) * qnorm(x2)))^2]
data[, Independent_Variable10 := (pnorm(Correl * x1 + sqrt(1-Correl^2) * qnorm(x2)))^4]
data[, Independent_Variable11 := as.factor(
 data.table::fifelse(Independent_Variable2 < 0.15, "A",
        data.table::fifelse(Independent_Variable2 < 0.45, "B",
               data.table::fifelse(Independent_Variable2 < 0.65,  "C",
                      data.table::fifelse(Independent_Variable2 < 0.85,  "D", "E")))))]
data.table::set(data, j = c("x1", "x2"), value = NULL)

# Get number of columns for LayerStructure
N <- length(names(data)[2L:ncol(data)])

# Run algo
Output <- RemixAutoML::H2oAutoencoder(

   # Select the service
   AnomalyDetection = TRUE,
   DimensionReduction = TRUE,

   # Data related args
   data = data,
   ValidationData = NULL,
   Features = names(data)[2L:ncol(data)],
   RemoveFeatures = FALSE,

   # H2O args
   NThreads = max(1L, parallel::detectCores()-2L),
   MaxMem = "28G",
   H2oShutdown = TRUE,
   ModelID = "TestModel",
   LayerStructure = NULL,
   ReturnLayer = 4L,
   per_feature = TRUE,
   Activation = "Tanh",
   Epochs = 5L,
   L2 = 0.10,
   ElasticAveraging = TRUE,
   ElasticAveragingMovingRate = 0.90,
   ElasticAveragingRegularization = 0.001)

 # Inspect output
 Data <- Output$Data
 Model <- Output$Model

 # If ValidationData is not null
 ValidationData <- Output$ValidationData
# }

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples