The function builds a generalized isolation forest that uses fuzzy logic to determine if a record is anomalous on not.
The function takes a wide-format data.frame object as input and returns it with two appended vectors.
The first vector contains the anomaly scores as numbers between zero and one, and the second vector provides
a set of logical values indicating whether the records are outliers (TRUE) or not (FALSE).
Usage
gif(dta, nt = 100L, nss = NULL, threshold = 0.95)
Value
The wide-format data.frame is provided as input data and contains extra columns, i.e., for both anomaly scores and the outlier flags.
Arguments
dta
A wide-format data.frame object with records (stored by row).
nt
Number of generalized isolation trees to build to form the forest. By default, it is set to 100.
nss
Number of subsamples used to build a single generalized isolation tree.
If set (by default) to NULL, the program will randomly select 25% of the records provided to the dta argument.
threshold
A number between zero and one used as a threshold when identifying outliers from the anomaly scores.
By default, this argument is set to 0.95, so that 5% of the records is going to be classified as anomalous.
The argument dta is proivded as an object of class data.frame.
This object is considered as a wide-format data.frame.
The use of the R-packages dplyr, purrr, and tidyr is highly recommended to simplify the conversion of datasets between long and wide formats.