DDoutlier (version 0.1.0)

NAN: Natural Neighbor (NAN) algorithm to return the self-adaptive neighborhood

Description

Function to identify natural neighbors and the right k-parameter for kNN graphs as suggested by Zhu, Q., Feng, Ji. & Huang, J. (2016)

Usage

NAN(dataset, NaN_Edges = FALSE)

Arguments

dataset

The dataset for which natural neighbors are identified along with a k-parameter

NaN_Edges

Choice for computing natural neighbors. Computational heavy to compute

Value

NaN_Num

The number of in-degrees for observations given r

r

Natural neighbor eigenvalue. Useful as k-parameter

NaN_Edges

Matrix of edges for natural neighbors

n_NaN

The number of natural neighbors

Details

NAN computes the natural neighbor eigenvalue and identifies natural neighbors in a dataset. The natural neighbor eigenvalue is powerful as k-parameter for computing a k-nearest neighborhood, being suitable for outlier detection, clustering or predictive modelling. Natural neighbors are defined as two observations being mutual k-nearest neighbors. A kd-tree is used for kNN computation, using the kNN() function from the 'dbscan' package

References

Zhu, Q., Feng, Ji. & Huang, J. (2016). Natural neighbor: A self-adaptive neighborhood method without parameter K. Pattern Recognition Letters. pp. 30-36. DOI: 10.1016/j.patrec.2016.05.007

Examples

# NOT RUN {
# Select dataset
X <- iris[,1:4]

# Identify the right k-parameter
K <- NAN(X, NaN_Edges=FALSE)$r

# Use the k-setting in an abitrary outlier detection algorithm
outlier_score <- LOF(dataset=X, k=K)

# Sort and find index for most outlying observations
names(outlier_score) <- 1:nrow(X)
sort(outlier_score, decreasing = TRUE)

# Inspect the distribution of outlier scores
hist(outlier_score)
# }