The MOE
function indicates outlying cells for
a data vector with \(p\) entries or data matrix with \(n \times p\) entries containing only numeric entries x
for a given center mu
and covariance matrix Sigma
using the Shapley value.
It is a more sophisticated alternative to the SCD
algorithm,
which uses the information of the regular cells to derive an alternative reference point Mayrhofer2022ShapleyOutlier.
MOE(
x,
mu,
Sigma,
Sigma_inv = NULL,
step_size = 0.1,
min_deviation = 0,
max_step = NULL,
local = TRUE,
max_iter = 1000,
q = 0.99,
check_outlyingness = FALSE,
check = TRUE,
cells = NULL,
method = "cellMCD"
)
A list of class shapley_algorithm
(new_shapley_algorithm
) containing the following:
x
A \(p\)-dimensional vector (or a \(n \times p\) matrix) containing the imputed data.
phi
A \(p\)-dimensional vector (or a \(n \times p\) matrix) containing the Shapley values (outlyingness-scores) of x
; see shapley
.
mu_tilde
A \(p\)-dimensional vector (or a \(n \times p\) matrix) containing the alternative reference points based on the regular cells of the original observations.
x_original
A \(p\)-dimensional vector (or a \(n \times p\) matrix) containing the original data.
x_original
The non-centrality parameters for the Chi-Squared distribution
x_history
A list with \(n\) elements, each containing the path of how the original data vector was modified.
phi_history
A list with \(n\) elements, each containing the Shapley values corresponding to x_history
.
mu_tilde_history
A list with \(n\) elements, each containing the alternative reference points corresponding to x_history
.
S_history
A list with \(n\) elements, each containing the indices of the outlying cells in each iteration.
Data vector with \(p\) entries or data matrix with \(n \times p\) entries containing only numeric entries.
Either NULL
(default) or mean vector of x
. If NULL, method
is used for parameter estimation.
Either NULL
(default) or covariance matrix \(p \times p\) of x
. If NULL, method
is used for parameter estimation.
Either NULL
(default) or Sigma's inverse \(p \times p\) matrix.
If NULL
, the inverse of Sigma
is computed using solve(Sigma)
.
Numeric. Step size for the imputation of outlying cells, with step_size
\(\in [0,1]\). Defaults to \(0.1\).
Numeric. Detection threshold, with min_deviation
\(\in [0,1]\). Defaults to \(0.2\)
Either NULL
(default) or an integer. The maximum number of steps in each iteration.
If NULL
, max_step
\(= p\).
Logical. If TRUE (default), the non-central Chi-Squared distribution is used to determine the cutoff value based on mu_tilde
.
Integer. The maximum number of iterations.
Numeric. The quantile of the Chi-squared distribution for detection and imputation of outliers. Defaults to \(0.99\).
Logical. If TRUE (default), the outlyingness is rechecked after applying min_deviation
.
Logical. If TRUE
(default), inputs are checked before running the function
and an error message is returned if one of the inputs is not as expected.
Either NULL
(default) or a vector/matrix of the same dimension as x
,
indicating the outlying cells. The matrix must contain only zeros and ones, or TRUE
/FALSE
.
Either "cellMCD" (default) or "MCD". Specifies the method used for parameter estimation if mu
and/or Sigma
are not provided.
p <- 5
mu <- rep(0,p)
Sigma <- matrix(0.9, p, p); diag(Sigma) = 1
Sigma_inv <- solve(Sigma)
x <- c(0,1,2,2.3,2.5)
MOE_x <- MOE(x = x, mu = mu, Sigma = Sigma)
plot(MOE_x)
library(MASS)
set.seed(1)
n <- 100; p <- 10
mu <- rep(0,p)
Sigma <- matrix(0.9, p, p); diag(Sigma) = 1
X <- mvrnorm(n, mu, Sigma)
X[sample(1:(n*p), 100, FALSE)] <- rep(c(-5,5),50)
MOE_X <- MOE(X, mu, Sigma)
plot(MOE_X, subset = 20)
Run the code above in your browser using DataLab