Learn R Programming

AnomalyDetection (version 0.1.2)

mahalanobis_distance: Mahalanobis Distance

Description

mahalanobis_distance calculates the distance between the elements in data and the mean vector of the data for outlier detection. Values are independent of the scale between variables.

Usage

mahalanobis_distance(data, output = "md", normalize = FALSE)

Arguments

data

numeric data

output

character vector stating the results to be returned. Can be "md" to return the Mahalanobis distances (default), "bd" to return the absolute breakdown distances (used to see which columns drive the Mahalanobis distance), or "both" to return both md and bd values.

normalize

logical value of either TRUE or FALSE. If TRUE will normalize the breakdown distances within each variable so that breakdown distances across variables can be compared.

Value

Depending on the output parameter, the output will return either:

  1. md: vector of Mahalanobis distances, one for each matrix row

  2. bd: matrix of the absolute values of the breakdown distances; used to see which columns drive the Mahalanobis distance

  3. both: matrix containing both Mahalanobis and breakdown distances

References

W. Wang and R. Battiti, "Identifying Intrusions in Computer Networks with Principal Component Analysis," in First International Conference on Availability, Reliability and Security, 2006.

Examples

Run this code
# NOT RUN {
x <- data.frame(C1 = rnorm(100), C2 = rnorm(100), C3 = rnorm(100))

# add Mahalanobis distance results to data frame
x %>%
  dplyr::mutate(MD = mahalanobis_distance(x))

# add Mahalanobis distance and breakdown distance results to data frame
x %>%
  cbind(mahalanobis_distance(x, "both"))

# add Mahalanobis distance and normalized breakdown distance results to data frame
x %>%
  cbind(mahalanobis_distance(x, "both", normalize = TRUE))
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab