TDA (version 1.9)

dtm: Distance to Measure Function

Description

The function dtm computes the "distance to measure function" on a set of points Grid, using the uniform empirical measure on a set of points X. Given a probability measure \(P\), The distance to measure function, for each \(y \in R^d\), is defined by $$ d_{m0}(y) = \left(\frac{1}{m0}\int_0^{m0} ( G_y^{-1}(u))^{r} du\right)^{1/r}, $$ where \(G_y(t) = P( \Vert X-y \Vert \le t)\), and \(m0 \in (0,1)\) and \(r \in [1,\infty)\) are tuning parameters. As m0 increases, DTM function becomes smoother, so m0 can be understood as a smoothing parameter. r affects less but also changes DTM function as well. The DTM can be seen as a smoothed version of the distance function. See Details and References.

Given \(X=\{x_1, \dots, x_n\}\), the empirical version of the distance to measure is $$ \hat d_{m0}(y) = \left(\frac{1}{k} \sum_{x_i \in N_k(y)} \Vert x_i-y \Vert^{r}\right)^{1/r}, $$ where \(k= \lceil m0 * n \rceil\) and \(N_k(y)\) is the set containing the \(k\) nearest neighbors of \(y\) among \(x_1, \ldots, x_n\).

Usage

dtm(X, Grid, m0, r = 2, weight = 1)

Value

The function dtm returns a vector of length \(m\) (the number of points stored in Grid) containing the value of the distance to measure function evaluated at each point of Grid.

Arguments

X

an \(n\) by \(d\) matrix of coordinates of points used to construct the uniform empirical measure for the distance to measure, where \(n\) is the number of points and \(d\) is the dimension.

Grid

an \(m\) by \(d\) matrix of coordinates of points where the distance to measure is computed, where \(m\) is the number of points in Grid and \(d\) is the dimension.

m0

a numeric variable for the smoothing parameter of the distance to measure. Roughly, m0 is the the percentage of points of X that are considered when the distance to measure is computed for each point of Grid. The value of m0 should be in \((0,1)\).

r

a numeric variable for the tuning parameter of the distance to measure. The value of r should be in \([1,\infty)\), and the default value is 2.

weight

either a number, or a vector of length \(n\). If it is a number, then same weight is applied to each points of X. If it is a vector, weight represents weights of each points of X. The default value is 1.

Author

Jisu Kim and Fabrizio Lecci

Details

See (Chazal, Cohen-Steiner, and Merigot, 2011, Definition 3.2) and (Chazal, Massart, and Michel, 2015, Equation (2)) for a formal definition of the "distance to measure" function.

References

Chazal F, Cohen-Steiner D, Merigot Q (2011). "Geometric inference for probability measures." Foundations of Computational Mathematics 11.6, 733-751.

Chazal F, Massart P, Michel B (2015). "Rates of convergence for robust geometric inference."

Chazal F, Fasy BT, Lecci F, Michel B, Rinaldo A, Wasserman L (2014). "Robust Topological Inference: Distance-To-a-Measure and Kernel Distance." Technical Report.

See Also

kde, kernelDist, distFct

Examples

Run this code
## Generate Data from the unit circle
n <- 300
X <- circleUnif(n)

## Construct a grid of points over which we evaluate the function
by <- 0.065
Xseq <- seq(-1.6, 1.6, by = by)
Yseq <- seq(-1.7, 1.7, by = by)
Grid <- expand.grid(Xseq, Yseq)

## distance to measure
m0 <- 0.1
DTM <- dtm(X, Grid, m0)

Run the code above in your browser using DataCamp Workspace