predict.solitude: Predict method for solitude class

Description

Predict anomaly score and corrected depth of each observation in the data using the isolation forest

Usage

# S3 method for solitude
predict(object, data, type = "anomaly_score",
  aggregator = "median", ...)

Arguments

object

Isolation forest model of class 'solitude'

data

Dataframe to predict on

type

Type of prediction. One among: anomaly_score, depth_corrected

aggregator

Function(as a string, default is 'median') to aggregate the corrected depths per observation over all trees. This is applicable when type is anomaly_score

...

Arguments to be passed to future.apply::future_lapply when future backend is setup

Value

Two outputs depending on type argument:

anomaly_score: A vector(length of number of observations in the data) of scores. See details for the thumb rule about interpreting them.
depth_corrected: A matrix with number of rows equal to number of observations in the data and number of columns equal to the number of trees in the model. A value is the estimated depth of an observation in some tree.

Details

The following types of prediction are supported:

anomaly_score: The thumb rule says: If the score is closer to 1 for a some obervations, they are likely outliers. If the score for all observations hover around 0.5, there might not be outliers at all.
depth_corrected: This estimates the depth of the observation by adding a factor of average length unsuccessful search in binary search tree.

See <doi:10.1145/2133360.2133363> for more details.

The predict method supports parallelism via futures.

Examples

Run this code

# NOT RUN {
set.seed(100)
index      <- sample.int(150, 75)
iris_train <- iris[index, ]
iris_test  <- iris[-index, ]
mo         <- isolation_forest(iris_train)
set.seed(100)
index      <- sample.int(150, 100)
iris_train <- iris[index, ]
iris_test  <- iris[-index, ]
mo         <- isolation_forest(iris_train[, 1:4], seed = 101)
scores     <- predict(mo, iris_test)
summary(scores)
with(iris_test
     , plot(Sepal.Length
            , Sepal.Width
            , col = Species
            , cex = ifelse(scores > 0.58, 2, 1)
            , pch = 20
            )
     )
# }
# NOT RUN {
with(iris_train
     , plot(Sepal.Length
            , Sepal.Width
            , col = Species
            , cex = ifelse(predict(mo, iris_train) > 0.6, 2, 1)
            , pch = 20
            )
     )
# }

Run the code above in your browser using DataLab