pointdensity

0th

Percentile

Calculate Local Density at Each Data Point

Calculate the local density at each data point as either the number of points in the eps-neighborhood (as used in DBSCAN) or the kernel density estimate (kde) of a uniform kernel. The function uses a kd-tree for fast fixed-radius nearest neighbor search.

Keywords
model
Usage
pointdensity(x, eps, type = "frequency",
  search = "kdtree", bucketSize = 10,
  splitRule = "suggest", approx = 0)
Arguments
x
a data matrix.
eps
radius of the eps-neighborhood, i.e., bandwidth of the uniform kernel).
type
"frequency" or "density". should the raw count of points inside the eps-neighborhood or the kde be returned.
search, bucketSize, splitRule, approx
algorithmic parameters for frNN.
Details

DBSCAN estimates the density around a point as the number of points in the eps-neighborhood of the point. The kde using a uniform kernel is just this count divided by \(2 eps n\), where \(n\) is the number of points in x. Points with low local density often indicate noise (see e.g., Wishart (1969) and Hartigan (1975)).

Value

A vector of the same length as data points (rows) in x with the count or density values for each data point.

References

WISHART, D. (1969), Mode Analysis: A Generalization of Nearest Neighbor which Reduces Chaining Effects, in Numerical Taxonomy, Ed., A.J. Cole, Academic Press, 282-311. John A. Hartigan (1975), Clustering Algorithms, John Wiley \& Sons, Inc., New York, NY, USA.

See Also

frNN.

Aliases
  • pointdensity
  • density
Examples
set.seed(665544)
n <- 100
x <- cbind(
  x=runif(10, 0, 5) + rnorm(n, sd=0.4),
  y=runif(10, 0, 5) + rnorm(n, sd=0.4)
  )
plot(x)

### calculate density
d <- pointdensity(x, eps = .5, type = "density")

### density distribution
summary(d)
hist(d, breaks = 10)

### point size is proportional to Density
plot(x, pch = 19, main = "Density (eps = .5)", cex = d*5)

### Wishart (1969) single link clustering method
# 1. remove noise with low density
f <- pointdensity(x, eps = .5, type = "frequency")
x_nonoise <- x[f >= 5,]

# 2. use single-linkage on the non-noise points
hc <- hclust(dist(x_nonoise), method = "single")
plot(x, pch = 19, cex = .5)
points(x_nonoise, pch = 19, col= cutree(hc, k = 4)+1L)
Documentation reproduced from package dbscan, version 1.1-1, License:

Community examples

Looks like there are no examples yet.