H2OAutoencoder: H2OAutoencoder

Description

H2OAutoencoder for anomaly detection and or dimensionality reduction

Usage

H2OAutoencoder(
  AnomalyDetection = FALSE,
  DimensionReduction = TRUE,
  data,
  Features = NULL,
  RemoveFeatures = FALSE,
  NThreads = max(1L, parallel::detectCores() - 2L),
  MaxMem = "28G",
  H2OStart = TRUE,
  H2OShutdown = TRUE,
  ModelID = "TestModel",
  model_path = NULL,
  LayerStructure = NULL,
  NodeShrinkRate = (sqrt(5) - 1)/2,
  ReturnLayer = 4L,
  per_feature = TRUE,
  Activation = "Tanh",
  Epochs = 5L,
  L2 = 0.1,
  ElasticAveraging = TRUE,
  ElasticAveragingMovingRate = 0.9,
  ElasticAveragingRegularization = 0.001
)

Arguments

AnomalyDetection

Set to TRUE to run anomaly detection

DimensionReduction

Set to TRUE to run dimension reduction

data

The data.table with the columns you wish to have analyzed

Features

NULL Column numbers or column names

RemoveFeatures

Set to TRUE if you want the features you specify in the Features argument to be removed from the data returned

NThreads

max(1L, parallel::detectCores()-2L)

MaxMem

"28G"

H2OStart

TRUE to start H2O inside the function

H2OShutdown

Setting to TRUE will shutdown H2O when it done being used internally.

ModelID

"TestModel"

model_path

If NULL no model will be saved. If a valid path is supplied the model will be saved there

LayerStructure

If NULL, layers and sizes will be created for you, using NodeShrinkRate and 7 layers will be created.

NodeShrinkRate

= (sqrt(5) - 1) / 2,

ReturnLayer

Which layer of the NNet to return. Choose from 1-7 with 4 being the layer with the least amount of nodes

per_feature

Set to TRUE to have per feature anomaly detection generated. Otherwise and overall value will be generated

Activation

Choose from "Tanh", "TanhWithDropout", "Rectifier", "RectifierWithDropout","Maxout", "MaxoutWithDropout"

Epochs

Quantile value to find the cutoff value for classifying outliers

Specify the amount of memory to allocate to H2O. E.g. "28G"

ElasticAveraging

Specify the number of threads (E.g. cores * 2)

ElasticAveragingMovingRate

Specify the number of decision trees to build

ElasticAveragingRegularization

Specify the row sample rate per tree

Value

A data.table

Examples

Run this code

# NOT RUN {
############################
# Training
############################

# Create simulated data
data <- RemixAutoML::FakeDataGenerator(
  Correlation = 0.70,
  N = 1000L,
  ID = 2L,
  FactorCount = 2L,
  AddDate = TRUE,
  AddComment = FALSE,
  ZIP = 2L,
  TimeSeries = FALSE,
  ChainLadderData = FALSE,
  Classification = FALSE,
  MultiClass = FALSE)

# Run algo
Output <- RemixAutoML::H2OAutoencoder(

  # Select the service
  AnomalyDetection = TRUE,
  DimensionReduction = TRUE,

  # Data related args
  data = data,
  Features = names(data)[2L:(ncol(data)-1L)],
  per_feature = FALSE,
  RemoveFeatures = FALSE,
  ModelID = "TestModel",
  model_path = getwd(),

  # H2O Environment
  NThreads = max(1L, parallel::detectCores()-2L),
  MaxMem = "28G",
  H2OStart = TRUE,
  H2OShutdown = TRUE,

  # H2O ML Args
  LayerStructure = NULL,
  NodeShrinkRate = (sqrt(5) - 1) / 2,
  ReturnLayer = 4L,
  Activation = "Tanh",
  Epochs = 5L,
  L2 = 0.10,
  ElasticAveraging = TRUE,
  ElasticAveragingMovingRate = 0.90,
  ElasticAveragingRegularization = 0.001)

# Inspect output
data <- Output$Data
Model <- Output$Model

# If ValidationData is not null
ValidationData <- Output$ValidationData

############################
# Scoring
############################

# Create simulated data
data <- RemixAutoML::FakeDataGenerator(
  Correlation = 0.70,
  N = 1000L,
  ID = 2L,
  FactorCount = 2L,
  AddDate = TRUE,
  AddComment = FALSE,
  ZIP = 2L,
  TimeSeries = FALSE,
  ChainLadderData = FALSE,
  Classification = FALSE,
  MultiClass = FALSE)

# Run algo
data <- RemixAutoML::H2OAutoencoderScoring(

  # Select the service
  AnomalyDetection = TRUE,
  DimensionReduction = TRUE,

  # Data related args
  data = data,
  Features = names(data)[2L:ncol(data)],
  RemoveFeatures = TRUE,
  per_feature = FALSE,
  ModelObject = NULL,
  ModelID = "TestModel",
  model_path = getwd(),

  # H2O args
  NThreads = max(1L, parallel::detectCores()-2L),
  MaxMem = "28G",
  H2OStart = TRUE,
  H2OShutdown = TRUE,
  ReturnLayer = 4L)
# }

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples