prabclus (version 2.3-3)

NNclean: Nearest neighbor based clutter/noise detection

Description

Detects if data points are noise or part of a cluster, based on a Poisson process model.

Usage

NNclean(data, k, distances = NULL, edge.correct = FALSE, wrap = 0.1,
convergence = 0.001, plot=FALSE, quiet=TRUE)

# S3 method for nnclean print(x, ...)

Value

NNclean returns a list of class nnclean with components

z

0-1-vector of length of the number of data points. 1 means cluster, 0 means noise.

probs

vector of estimated a priori probabilities for each point to belong to the cluster component.

k

see above.

lambda1

intensity parameter of cluster component.

lambda2

intensity parameter of noise component.

p

estimated probability of cluster component.

kthNND

distance to kth nearest neighbor.

Arguments

data

numerical matrix or data frame.

k

integer. Number of considered nearest neighbors per point.

distances

distance matrix object of class dist. If specified, it is used instead of computing distances from the data.

edge.correct

logical. If TRUE and the data is two-dimensional, neighbors for points at the edges of the parent region of the noise Poisson process are determined after wrapping the region onto a toroid.

wrap

numerical. If edge.correct=TRUE, points in a strip of size wrap*range along the edge for each variable are candidates for being neighbors of points from the opposite.

convergence

numerical. Convergence criterion for EM-algorithm.

plot

logical. If TRUE, a histogram of the distance to kth nearest neighbor and fit is plotted.

quiet

logical. If FALSE, the likelihood is printed during the iterations.

x

object of class nnclean.

...

necessary for print methods.

Author

R-port by Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en,
original Splus package by S. Byers and A. E. Raftery.

Details

The assumption is that the noise is distributed as a homogeneous Poisson process on a certain region and the clusters are distributed as a homogeneous Poisson process with larger intensity on a subregion (disconnected in case of more than one cluster). The distances are then distributed according to a mixture of two transformed Gamma distributions, and this mixture is estimated via the EM-algorithm. The points are assigned to noise or cluster component by use of the estimated a posteriori probabilities.

References

Byers, S. and Raftery, A. E. (1998) Nearest-Neighbor Clutter Removal for Estimating Features in Spatial Point Processes, Journal of the American Statistical Association, 93, 577-584.

Examples

Run this code
library(mclust)
data(chevron)
nnc <-  NNclean(chevron[,2:3],15,plot=TRUE)
plot(chevron[,2:3],col=1+nnc$z)

Run the code above in your browser using DataLab