Learn R Programming

HRTnomaly (version 25.11.22)

dif: Deep Isolation Forest

Description

The function builds a deep isolation forest that uses fuzzy logic to determine if a record is anomalous or not. The function takes a wide-format data.frame object as input and returns it with two appended vectors. The first vector contains the anomaly scores as numbers between zero and one, and the second vector provides a set of logical values indicating whether the records are outliers (TRUE) or not (FALSE).

Usage

dif(dta, nt = 100L, nss = NULL, threshold = 0.95)

Value

The wide-format data.frame is provided as input data and contains extra columns:

scores

A numeric vector of anomaly scores ranging from 0 to 1, where values closer to 1 indicate higher anomaly.

flags

A logical vector indicating whether each record is flagged as an outlier (TRUE) or not (FALSE) based on the specified threshold.

Arguments

dta

A wide-format data.frame object with records (stored by row).

nt

Number of deep isolation trees to build to form the forest. By default, it is set to 100.

nss

Number of subsamples used to build a single deep isolation tree. If set (by default) to NULL, the program will randomly select 25% of the records provided to the dta argument.

threshold

A number between zero and one used as a threshold when identifying outliers from the anomaly scores. By default, this argument is set to 0.95, so that 5% of the records is going to be classified as anomalous.

Author

Luca Sartore drwolf85@gmail.com

Details

The argument dta is provided as an object of class data.frame. This object is considered as a wide-format data.frame. The use of the R-packages dplyr, purrr, and tidyr is highly recommended to simplify the conversion of datasets between long and wide formats.

Examples

Run this code
if (FALSE) {
# Load the package
library(HRTnomaly)
set.seed(2025L)
# Detect outliers in the `iris` dataset
res <- dif(iris)
}

Run the code above in your browser using DataLab