MOE: Detecting cellwise outliers using Shapley values based on local outlyingness.

Description

The MOE function indicates outlying cells for a data vector with \(p\) entries or data matrix with \(n \times p\) entries containing only numeric entries x for a given center mu and covariance matrix Sigma using the Shapley value. It is a more sophisticated alternative to the SCD algorithm, which uses the information of the regular cells to derive an alternative reference point Mayrhofer2022ShapleyOutlier.

Usage

MOE(
  x,
  mu,
  Sigma,
  Sigma_inv = NULL,
  step_size = 0.1,
  min_deviation = 0,
  max_step = NULL,
  local = TRUE,
  max_iter = 1000,
  q = 0.99,
  check_outlyingness = FALSE,
  check = TRUE,
  cells = NULL,
  method = "cellMCD"
)

Value

A list of class shapley_algorithm (new_shapley_algorithm) containing the following:

x: A \(p\)-dimensional vector (or a \(n \times p\) matrix) containing the imputed data.
phi: A \(p\)-dimensional vector (or a \(n \times p\) matrix) containing the Shapley values (outlyingness-scores) of x; see shapley.
mu_tilde: A \(p\)-dimensional vector (or a \(n \times p\) matrix) containing the alternative reference points based on the regular cells of the original observations.
x_original: A \(p\)-dimensional vector (or a \(n \times p\) matrix) containing the original data.
x_original: The non-centrality parameters for the Chi-Squared distribution
x_history: A list with \(n\) elements, each containing the path of how the original data vector was modified.
phi_history: A list with \(n\) elements, each containing the Shapley values corresponding to x_history.
mu_tilde_history: A list with \(n\) elements, each containing the alternative reference points corresponding to x_history.
S_history: A list with \(n\) elements, each containing the indices of the outlying cells in each iteration.

Arguments

x: Data vector with \(p\) entries or data matrix with \(n \times p\) entries containing only numeric entries.
mu: Either NULL (default) or mean vector of x. If NULL, method is used for parameter estimation.
Sigma: Either NULL (default) or covariance matrix \(p \times p\) of x. If NULL, method is used for parameter estimation.
Sigma_inv: Either NULL (default) or Sigma's inverse \(p \times p\) matrix. If NULL, the inverse of Sigma is computed using solve(Sigma).
step_size: Numeric. Step size for the imputation of outlying cells, with step_size \(\in [0,1]\). Defaults to \(0.1\).
min_deviation: Numeric. Detection threshold, with min_deviation \(\in [0,1]\). Defaults to \(0.2\)
max_step: Either NULL (default) or an integer. The maximum number of steps in each iteration. If NULL, max_step \(= p\).
local: Logical. If TRUE (default), the non-central Chi-Squared distribution is used to determine the cutoff value based on mu_tilde.
max_iter: Integer. The maximum number of iterations.
q: Numeric. The quantile of the Chi-squared distribution for detection and imputation of outliers. Defaults to \(0.99\).
check_outlyingness: Logical. If TRUE (default), the outlyingness is rechecked after applying min_deviation.
check: Logical. If TRUE (default), inputs are checked before running the function and an error message is returned if one of the inputs is not as expected.
cells: Either NULL (default) or a vector/matrix of the same dimension as x, indicating the outlying cells. The matrix must contain only zeros and ones, or TRUE/FALSE.
method: Either "cellMCD" (default) or "MCD". Specifies the method used for parameter estimation if mu and/or Sigma are not provided.

References

Examples

Run this code

p <- 5
mu <- rep(0,p)
Sigma <- matrix(0.9, p, p); diag(Sigma) = 1
Sigma_inv <- solve(Sigma)
x <- c(0,1,2,2.3,2.5)
MOE_x <- MOE(x = x, mu = mu, Sigma = Sigma)
plot(MOE_x)

library(MASS)
set.seed(1)
n <- 100; p <- 10
mu <- rep(0,p)
Sigma <- matrix(0.9, p, p); diag(Sigma) = 1
X <- mvrnorm(n, mu, Sigma)
X[sample(1:(n*p), 100, FALSE)] <- rep(c(-5,5),50)
MOE_X <- MOE(X, mu, Sigma)
plot(MOE_X, subset = 20)

Run the code above in your browser using DataLab