isotree (version 0.1.28)

export.isotree.model: Export Isolation Forest model

Description

Save Isolation Forest model to a serialized file along with its metadata, in order to be used in the Python or the C++ versions of this package.

This function is not meant to be used for passing models to and from R - in such case, you can use `saveRDS` and `readRDS` instead.

Note that, if the model was fitted to a `data.frame`, the column names must be something exportable as JSON, and must be something that Python's Pandas could use as column names (e.g. strings/character).

It is recommended to visually inspect the produced `.metadata` file in any case.

Usage

export.isotree.model(model, file, ...)

Arguments

model

An Isolation Forest model as returned by function isolation.forest.

file

File path where to save the model. File connections are not accepted, only file paths

...

Additional arguments to pass to writeBin - you might want to pass extra parameters if passing files between different CPU architectures or similar.

Value

No return value.

Details

This function will create 2 files: the serialized model, in binary format, with the name passed in `file`; and a metadata file in JSON format with the same name but ending in `.metadata`. The second file should NOT be edited manually, except for the field `nthreads` if desired.

If the model was built with `build_imputer=TRUE`, there will also be a third binary file ending in `.imputer`.

The metadata will contain, among other things, the encoding that was used for categorical columns - this is under `data_info.cat_levels`, as an array of arrays by column, with the first entry for each column corresponding to category 0, second to category 1, and so on (the C++ version takes them as integers). This metadata is written to a JSON file using the `jsonlite` package, which must be installed in order for this to work.

The serialized file can be used in the C++ version by reading it as a binary raw file and de-serializing its contents with the `cereal` library or using the provided C++ functions for de-serialization. If using `ndim=1`, it will be an object of class `IsoForest`, and if using `ndim>1`, will be an object of class `ExtIsoForest`. The imputer file, if produced, will be an object of class `Imputer`.

The metadata is not used in the C++ version, but is necessary for the Python version.

Note that the model treats boolean/logical variables as categorical. Thus, if the model was fit to a `data.frame` with boolean columns, when importing this model into C++, they need to be encoded in the same order - e.g. the model might encode `TRUE` as zero and `FALSE` as one - you need to look at the metadata for this.

References

https://uscilab.github.io/cereal/

See Also

load.isotree.model writeBin unpack.isolation.forest