Learn R Programming

MVN (version 6.1)

mv_outlier: Identify Multivariate Outliers via Robust Mahalanobis Distances

Description

Computes robust Mahalanobis distances for multivariate data using the Minimum Covariance Determinant (MCD) estimator, flags outliers based on either a chi-square quantile cutoff or an adjusted cutoff using the Atkinson–Riani–Welsh (ARW) method, and optionally generates a Mahalanobis Q–Q plot.

Usage

mv_outlier(
  data,
  outlier = TRUE,
  qqplot = TRUE,
  alpha = 0.05,
  method = c("quan", "adj"),
  label = TRUE,
  title = "Chi-Square Q-Q Plot"
)

Value

A list containing the following components: outlier, a data frame of Mahalanobis distances with observation IDs and outlier flags (if outlier = TRUE); qq_outlier_plot, a ggplot object of the Mahalanobis Q–Q plot (if qqplot = TRUE); and newData, a data frame of non-outlier observations.

Arguments

data

A numeric matrix or data frame with observations in rows and at least two numeric columns.

outlier

Logical; if TRUE, includes the Mahalanobis distance values and outlier classification in the output. If FALSE, suppresses this component. Default is TRUE.

qqplot

Logical; if TRUE, a Chi-Square Q–Q plot is generated to visualize outlier detection. Default is TRUE.

alpha

Numeric; significance level used for the adjusted cutoff method (only applies if method = "adj"). Default is 0.05.

method

Character string specifying the outlier detection method. Must be either "quan" (quantile-based cutoff) or "adj" (adjusted cutoff via ARW). Default is "quan".

label

Logical; if TRUE and qqplot = TRUE, labels the detected outliers in the plot. Default is TRUE.

title

Optional character string specifying the title for the Q–Q plot. Default is "Chi-Square Q-Q Plot".

Examples

Run this code
if (FALSE) {
data <- iris[, 1:4]
res <- mv_outlier(data, method = "adj", alpha = 0.025)
res$outlier
res$qq_outlier_plot
head(res$newData)
}

Run the code above in your browser using DataLab