Learn R Programming

weird (version 1.0.2)

lookout: Lookout probabilities

Description

Compute leave-one-out log score probabilities using a Generalized Pareto distribution. These give the probability of each observation being an anomaly.

Usage

lookout(
  object = NULL,
  density_scores = NULL,
  loo_scores = density_scores,
  threshold_probability = 0.95
)

Value

A numerical vector containing the lookout probabilities

Arguments

object

A model object or a numerical data set.

density_scores

Numerical vector of log scores

loo_scores

Optional numerical vector of leave-one-out log scores

threshold_probability

Probability threshold when computing the POT model for the log scores.

Author

Rob J Hyndman

Details

This function can work with several object types. If object is not NULL, then the object is passed to density_scores to compute density scores (and possibly LOO density scores). Otherwise, the density scores are taken from the density_scores argument, and the LOO density scores are taken from the loo_scores argument. Then the Generalized Pareto distribution is fitted to the scores, to obtain the probability of each observation.

References

Sevvandi Kandanaarachchi & Rob J Hyndman (2022) "Leave-one-out kernel density estimates for outlier detection", J Computational & Graphical Statistics, 31(2), 586-599. https://robjhyndman.com/publications/lookout/

Examples

Run this code
# Univariate data
tibble(
  y = c(5, rnorm(49)),
  lookout = lookout(y)
)
# Bivariate data with score calculation done outside the function
tibble(
  x = rnorm(50),
  y = c(5, rnorm(49)),
  fscores = density_scores(y),
  loo_fscores = density_scores(y, loo = TRUE),
  lookout = lookout(density_scores = fscores, loo_scores = loo_fscores)
)
# Using a regression model
of <- oldfaithful |> filter(duration < 7200, waiting < 7200)
fit_of <- lm(waiting ~ duration, data = of)
of |>
  mutate(lookout_prob = lookout(fit_of)) |>
  arrange(lookout_prob)

Run the code above in your browser using DataLab