dbscan (version 1.1-1)

lof: Local Outlier Factor Score

Description

Calculate the Local Outlier Factor (LOF) score for each data point using a kd-tree to speed up kNN search.

Usage

lof(x, k = 4, ...)

Arguments

x

a data matrix or a dist object.

k

size of the neighborhood.

further arguments are passed on to kNN.

Value

A numeric vector of length ncol(x) containing LOF values for all data points.

Details

LOF compares the local density of an point to the local densities of its neighbors. Points that have a substantially lower density than their neighbors are considered outliers. A LOF score of approximately 1 indicates that density around the point is comparable to its neighbors. Scores significantly larger than 1 indicate outliers.

Note: If there are more than k duplicate points in the data, then lof can become NaN caused by an infinite local density. In this case we set lof to 1. The paper by Breunig et al (2000) suggests a different method of removing all duplicate points first.

References

Breunig, M., Kriegel, H., Ng, R., and Sander, J. (2000). LOF: identifying density-based local outliers. In ACM Int. Conf. on Management of Data, pages 93-104.

See Also

kNN, pointdensity, glosh.

Examples

Run this code
# NOT RUN {
set.seed(665544)
n <- 100
x <- cbind(
  x=runif(10, 0, 5) + rnorm(n, sd=0.4),
  y=runif(10, 0, 5) + rnorm(n, sd=0.4)
  )

### calculate LOF score
lof <- lof(x, k=3)

### distribution of outlier factors
summary(lof)
hist(lof, breaks=10)

### point size is proportional to LOF
plot(x, pch = ".", main = "LOF (k=3)")
points(x, cex = (lof-1)*3, pch = 1, col="red")
text(x[lof>2,], labels = round(lof, 1)[lof>2], pos = 3)
# }

Run the code above in your browser using DataCamp Workspace