NNclean: Nearest neighbor based clutter/noise detection

Description

Detects if data points are noise or part of a cluster, based on a Poisson process model.

Usage

NNclean(data, k, distances = NULL, edge.correct = FALSE, wrap = 0.1,
convergence = 0.001, plot=FALSE, quiet=TRUE)
# S3 method for nnclean
print(x, ...)

Value

NNclean returns a list of class nnclean with components

z: 0-1-vector of length of the number of data points. 1 means cluster, 0 means noise.
probs: vector of estimated a priori probabilities for each point to belong to the cluster component.
k: see above.
lambda1: intensity parameter of cluster component.
lambda2: intensity parameter of noise component.
p: estimated probability of cluster component.
kthNND: distance to kth nearest neighbor.

Arguments

data: numerical matrix or data frame.
k: integer. Number of considered nearest neighbors per point.
distances: distance matrix object of class dist. If specified, it is used instead of computing distances from the data.
edge.correct: logical. If TRUE and the data is two-dimensional, neighbors for points at the edges of the parent region of the noise Poisson process are determined after wrapping the region onto a toroid.
wrap: numerical. If edge.correct=TRUE, points in a strip of size wrap*range along the edge for each variable are candidates for being neighbors of points from the opposite.
convergence: numerical. Convergence criterion for EM-algorithm.
plot: logical. If TRUE, a histogram of the distance to kth nearest neighbor and fit is plotted.
quiet: logical. If FALSE, the likelihood is printed during the iterations.
x: object of class nnclean.
...: necessary for print methods.

Author

R-port by Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en,
original Splus package by S. Byers and A. E. Raftery.

Details

The assumption is that the noise is distributed as a homogeneous Poisson process on a certain region and the clusters are distributed as a homogeneous Poisson process with larger intensity on a subregion (disconnected in case of more than one cluster). The distances are then distributed according to a mixture of two transformed Gamma distributions, and this mixture is estimated via the EM-algorithm. The points are assigned to noise or cluster component by use of the estimated a posteriori probabilities.

References

Byers, S. and Raftery, A. E. (1998) Nearest-Neighbor Clutter Removal for Estimating Features in Spatial Point Processes, Journal of the American Statistical Association, 93, 577-584.

Examples

Run this code

library(mclust)
data(chevron)
nnc <-  NNclean(chevron[,2:3],15,plot=TRUE)
plot(chevron[,2:3],col=1+nnc$z)

Run the code above in your browser using DataLab