predict.outliertree: Predict method for Outlier Tree

Description

Predict method for Outlier Tree

Usage

# S3 method for outliertree
predict(
  object,
  newdata,
  outliers_print = 15L,
  min_decimals = 2L,
  return_outliers = TRUE,
  nthreads = object$nthreads,
  ...
)

Value

If passing `return_outliers` = `TRUE`, will return a list of lists with the outliers and their information (each row is an entry in the first list, with the same names as the rows in the input data frame), which can be printed into a human-readable format after-the-fact through functions `print` and `summary` (they do the same thing). Otherwise, will not return anything, but will print the outliers if any are detected. Note that, while the object that is returned will display a short summary of only some observations when printing it in the console, it actually contains information for all rows, and can be subsetted to obtain information specific to one row.

Arguments

object: An Outlier Tree object as returned by `outlier.tree`.
newdata: A Data Frame in which to look for outliers according to the fitted model.
outliers_print: How many outliers to print. Pass zero or `NULL` to avoid printing them. Must pass at least one of `outliers_print` and `return_outliers`.
min_decimals: Minimum number of decimals to use when printing numeric values for the flagged outliers. The number of decimals will be dynamically increased according to the relative magnitudes of the values being reported. Ignored when passing `outliers_print=0` or `outliers_print=FALSE`.
return_outliers: Whether to return the outliers in an R object (otherwise will just print them).
nthreads: Number of parallel threads to use. Parallelization is done by rows.
...: Not used.

Details

Note that after loading a serialized object from `outlier.tree` through `readRDS` or `load`, it will only de-serialize the underlying C++ object upon running `predict` or `print`, so the first run will be slower, while subsequent runs will be faster as the C++ object will already be in-memory.

Examples

Run this code

library(outliertree)
### random data frame with an obvious outlier
nrows = 100
set.seed(1)
df = data.frame(
    numeric_col1 = c(rnorm(nrows - 1), 1e6),
    numeric_col2 = rgamma(nrows, 1),
    categ_col    = sample(c('categA', 'categB', 'categC'),
        size = nrows, replace = TRUE)
    )
    
### test data frame with another obvious outlier
nrows_test = 50
df_test = data.frame(
    numeric_col1 = rnorm(nrows_test),
    numeric_col2 = c(-1e6, rgamma(nrows_test - 1, 1)),
    categ_col    = sample(c('categA', 'categB', 'categC'),
        size = nrows_test, replace = TRUE)
)
    
### fit model on training data
outliers_model = outlier.tree(df, outliers_print=FALSE, nthreads=1)

### find the test outlier
test_outliers = predict(outliers_model, df_test,
    outliers_print = 1, return_outliers = TRUE,
    nthreads = 1)

### retrieve the outlier info (for row 1) as an R list
test_outliers[[1]]

### to turn it into a 6-column table:
# dt = t(data.table::as.data.table(test_outliers))