Detects if data points are noise or part of a cluster, based on a Poisson process model.
NNclean(data, k, distances = NULL, edge.correct = FALSE, wrap = 0.1,
convergence = 0.001, plot=FALSE, quiet=TRUE)# S3 method for nnclean
print(x, ...)
NNclean returns a list of class nnclean with components
0-1-vector of length of the number of data points. 1 means cluster, 0 means noise.
vector of estimated a priori probabilities for each point to belong to the cluster component.
see above.
intensity parameter of cluster component.
intensity parameter of noise component.
estimated probability of cluster component.
distance to kth nearest neighbor.
numerical matrix or data frame.
integer. Number of considered nearest neighbors per point.
distance matrix object of class dist. If
    specified, it is used instead of computing distances from the data.
logical. If TRUE and the data is
    two-dimensional, neighbors for points at the edges of the parent
    region of the noise Poisson process are determined after wrapping
    the region onto a toroid.
numerical. If edge.correct=TRUE, points in a
    strip of size wrap*range along the edge for each variable
    are candidates for
    being neighbors of points from the opposite.
numerical. Convergence criterion for EM-algorithm.
logical. If TRUE, a histogram of the distance to
    kth nearest neighbor and fit is plotted.
logical. If FALSE, the likelihood is printed
    during the iterations.
object of class nnclean.
necessary for print methods.
R-port by Christian Hennig
  christian.hennig@unibo.it
  https://www.unibo.it/sitoweb/christian.hennig/en,
  original Splus package by S. Byers and A. E. Raftery.
The assumption is that the noise is distributed as a homogeneous Poisson process on a certain region and the clusters are distributed as a homogeneous Poisson process with larger intensity on a subregion (disconnected in case of more than one cluster). The distances are then distributed according to a mixture of two transformed Gamma distributions, and this mixture is estimated via the EM-algorithm. The points are assigned to noise or cluster component by use of the estimated a posteriori probabilities.
Byers, S. and Raftery, A. E. (1998) Nearest-Neighbor Clutter Removal for Estimating Features in Spatial Point Processes, Journal of the American Statistical Association, 93, 577-584.
library(mclust)
data(chevron)
nnc <-  NNclean(chevron[,2:3],15,plot=TRUE)
plot(chevron[,2:3],col=1+nnc$z)
Run the code above in your browser using DataLab