Learn R Programming

AnomalyDetection (version 0.2.5)

mahalanobis_distance: Mahalanobis Distance

Description

Calculates the distance between the elements in a data set and the mean vector of the data for outlier detection. Values are independent of the scale between variables.

Usage

mahalanobis_distance(data, output = c("md", "bd", "both"),
  normalize = FALSE)

# S3 method for matrix mahalanobis_distance(data, output = c("md", "bd", "both"), normalize = FALSE)

# S3 method for data.frame mahalanobis_distance(data, output = c("md", "bd", "both"), normalize = FALSE)

Arguments

data

A matrix or data frame. Data frames will be converted to matrices via data.matrix.

output

Character string specifying which distance metric(s) to compute. Current options include: "md" for Mahalanobis distance (default); "bd" for absolute breakdown distance (used to see which columns drive the Mahalanobis distance); and "both" to return both distance metrics.

normalize

Logical indicating whether or not to normalize the breakdown distances within each column (so that breakdown distances across columns can be compared).

Value

If output = "md", then a vector containing the Mahalanobis distances is returned. Otherwise, a matrix.

References

W. Wang and R. Battiti, "Identifying Intrusions in Computer Networks with Principal Component Analysis," in First International Conference on Availability, Reliability and Security, 2006.

Examples

Run this code
# NOT RUN {
# Simulate some data
x <- data.frame(C1 = rnorm(100), C2 = rnorm(100), C3 = rnorm(100))

# Add Mahalanobis distances
x %>% dplyr::mutate(MD = mahalanobis_distance(x))

# Add Mahalanobis and breakdown distances
x %>% cbind(mahalanobis_distance(x, output = "both"))

# Add Mahalanobis and normalized breakdown distances
x %>% cbind(mahalanobis_distance(x, output = "both", normalize = TRUE))
# }

Run the code above in your browser using DataLab