NNclean: Nearest neighbor based clutter/noise detection

Description

Detects if data points are noise or part of a cluster, based on a Poisson process model.

Usage

NNclean(data, k, distances = NULL, edge.correct = FALSE, wrap = 0.1,
convergence = 0.001, plot=FALSE, quiet=TRUE)
## S3 method for class 'nnclean':
print(x, ...)

Arguments

data

numerical matrix or data frame.

integer. Nmber of considered nearest neighbors per point.

distances

distance matrix object of class dist. If specified, it is used instead of computing distances from the data.

edge.correct

logical. If TRUE and the data is two-dimensional, neighbors for points at the edges of the parent region of the noise Poisson process are determined after wrapping the region onto a toroid.

wrap

numerical. If edge.correct=TRUE, points in a strip of size wrap*range along the edge for each variable are candidates for being neighbors of points from the opposite.

convergence

numerical. Convergence criterion for EM-algorithm.

plot

logical. If TRUE, a histogram of the distance to kth nearest neighbor and fit is plotted.

quiet

logical. If FALSE, the likelihood is printed during the iterations.

object of class nnclean.

...

necessary for print methods.

Value

NNclean returns a list of class nnclean with components
z0-1-vector of lenght of the number of data points. 1 means cluster, 0 means noise.
probsvector of estimated a priori probabilities for each point to belong to the cluster component.
ksee above.
lambda1intensity parameter of cluster component.
lambda2intensity parameter of noise component.
pestimated probability of cluster component.
kthNNDdistance to kth nearest neighbor.

Details

The assumption is that the noise is distributed as a homogeneous Poisson process on a certain region and the clusters are distributed as a homogeneous Poisson process with larger intensity on a subregion (disconnected in case of more than one cluster). The distances are then distributed according to a mixture of two transformed Gamma distributions, and this mixture is estimated via the EM-algorithm. The points are assigned to noise or cluster component by use of the estimated a posteriori probabilities.

References

Byers, S. and Raftery, A. E. (1998) Nearest-Neighbor Clutter Removal for Estimating Features in Spatial Point Processes, Journal of the American Statistical Association, 93, 577-584.

Examples

Run this code

library(mclust)
data(chevron)
nnc <-  NNclean(chevron[,2:3],15,plot=TRUE)
plot(chevron[,2:3],col=1+nnc$z)

Run the code above in your browser using DataLab