assertr (version 2.7)

maha_dist: Computes mahalanobis distance for each row of data frame

Description

This function will return a vector, with the same length as the number of rows of the provided data frame, corresponding to the average mahalanobis distances of each row from the whole data set.

Usage

maha_dist(data, keep.NA = TRUE, robust = FALSE, stringsAsFactors = FALSE)

Arguments

data

A data frame

keep.NA

Ensure that every row with missing data remains NA in the output? TRUE by default.

robust

Attempt to compute mahalanobis distance based on robust covariance matrix? FALSE by default

stringsAsFactors

Convert non-factor string columns into factors? FALSE by default

Value

A vector of observation-wise mahalanobis distances.

Details

This is useful for finding anomalous observations, row-wise.

It will convert any categorical variables in the data frame into numerics as long as they are factors. For example, in order for a character column to be used as a component in the distance calculations, it must either be a factor, or converted to a factor by using the stringsAsFactors parameter.

See Also

insist_rows

Examples

Run this code
# NOT RUN {
maha_dist(mtcars)

maha_dist(iris, robust=TRUE)


library(magrittr)            # for piping operator
library(dplyr)               # for "everything()" function

# using every column from mtcars, compute mahalanobis distance
# for each observation, and ensure that each distance is within 10
# median absolute deviations from the median
mtcars %>%
  insist_rows(maha_dist, within_n_mads(10), everything())
  ## anything here will run

# }

Run the code above in your browser using DataCamp Workspace