TDA (version 1.9)

dtm: Distance to Measure Function

Description

The function dtm computes the "distance to measure function" on a set of points Grid, using the uniform empirical measure on a set of points X. Given a probability measure $$P$$, The distance to measure function, for each $$y \in R^d$$, is defined by $$d_{m0}(y) = \left(\frac{1}{m0}\int_0^{m0} ( G_y^{-1}(u))^{r} du\right)^{1/r},$$ where $$G_y(t) = P( \Vert X-y \Vert \le t)$$, and $$m0 \in (0,1)$$ and $$r \in [1,\infty)$$ are tuning parameters. As m0 increases, DTM function becomes smoother, so m0 can be understood as a smoothing parameter. r affects less but also changes DTM function as well. The DTM can be seen as a smoothed version of the distance function. See Details and References.

Given $$X=\{x_1, \dots, x_n\}$$, the empirical version of the distance to measure is $$\hat d_{m0}(y) = \left(\frac{1}{k} \sum_{x_i \in N_k(y)} \Vert x_i-y \Vert^{r}\right)^{1/r},$$ where $$k= \lceil m0 * n \rceil$$ and $$N_k(y)$$ is the set containing the $$k$$ nearest neighbors of $$y$$ among $$x_1, \ldots, x_n$$.

Usage

dtm(X, Grid, m0, r = 2, weight = 1)

Value

The function dtm returns a vector of length $$m$$ (the number of points stored in Grid) containing the value of the distance to measure function evaluated at each point of Grid.

Arguments

X

an $$n$$ by $$d$$ matrix of coordinates of points used to construct the uniform empirical measure for the distance to measure, where $$n$$ is the number of points and $$d$$ is the dimension.

Grid

an $$m$$ by $$d$$ matrix of coordinates of points where the distance to measure is computed, where $$m$$ is the number of points in Grid and $$d$$ is the dimension.

m0

a numeric variable for the smoothing parameter of the distance to measure. Roughly, m0 is the the percentage of points of X that are considered when the distance to measure is computed for each point of Grid. The value of m0 should be in $$(0,1)$$.

r

a numeric variable for the tuning parameter of the distance to measure. The value of r should be in $$[1,\infty)$$, and the default value is 2.

weight

either a number, or a vector of length $$n$$. If it is a number, then same weight is applied to each points of X. If it is a vector, weight represents weights of each points of X. The default value is 1.

Author

Jisu Kim and Fabrizio Lecci

Details

See (Chazal, Cohen-Steiner, and Merigot, 2011, Definition 3.2) and (Chazal, Massart, and Michel, 2015, Equation (2)) for a formal definition of the "distance to measure" function.

References

Chazal F, Cohen-Steiner D, Merigot Q (2011). "Geometric inference for probability measures." Foundations of Computational Mathematics 11.6, 733-751.

Chazal F, Massart P, Michel B (2015). "Rates of convergence for robust geometric inference."

Chazal F, Fasy BT, Lecci F, Michel B, Rinaldo A, Wasserman L (2014). "Robust Topological Inference: Distance-To-a-Measure and Kernel Distance." Technical Report.

kde, kernelDist, distFct

Examples

Run this code
## Generate Data from the unit circle
n <- 300
X <- circleUnif(n)

## Construct a grid of points over which we evaluate the function
by <- 0.065
Xseq <- seq(-1.6, 1.6, by = by)
Yseq <- seq(-1.7, 1.7, by = by)
Grid <- expand.grid(Xseq, Yseq)

## distance to measure
m0 <- 0.1
DTM <- dtm(X, Grid, m0)

Run the code above in your browser using DataCamp Workspace