dtm: Distance to Measure Function

Description

The function dtm computes the "distance to measure function" on a set of points Grid, using the uniform empirical measure on a set of points X. Given a probability measure $P$, The distance to measure function, for each $y \in R^d$, is defined by $$ d_{m0}(y) = \left(\frac{1}{m0}\int_0^{m0} ( G_y^{-1}(u))^{r} du\right)^{1/r}, $$ where $G_y(t) = P( \Vert X-y \Vert \le t)$, and $m0 \in (0,1)$ and $r \in [1,\infty)$ are tuning parameters. As m0 increases, DTM function becomes smoother, so m0 can be understood as a smoothing parameter. r affects less but also changes DTM function as well. The DTM can be seen as a smoothed version of the distance function. See Details and References.

Given $X=\{x_1, \dots, x_n\}$, the empirical version of the distance to measure is $$ \hat d_{m0}(y) = \left(\frac{1}{k} \sum_{x_i \in N_k(y)} \Vert x_i-y \Vert^{r}\right)^{1/r}, $$ where $k= \lceil m0 * n \rceil$ and $N_k(y)$ is the set containing the $k$ nearest neighbors of $y$ among $x_1, \ldots, x_n$.

Usage

dtm(X, Grid, m0, r = 2, weight = 1)

Value

The function dtm returns a vector of length $m$ (the number of points stored in Grid) containing the value of the distance to measure function evaluated at each point of Grid.

Arguments

X: an $n$ by $d$ matrix of coordinates of points used to construct the uniform empirical measure for the distance to measure, where $n$ is the number of points and $d$ is the dimension.
Grid: an $m$ by $d$ matrix of coordinates of points where the distance to measure is computed, where $m$ is the number of points in Grid and $d$ is the dimension.
m0: a numeric variable for the smoothing parameter of the distance to measure. Roughly, m0 is the the percentage of points of X that are considered when the distance to measure is computed for each point of Grid. The value of m0 should be in $(0,1)$.
r: a numeric variable for the tuning parameter of the distance to measure. The value of r should be in $[1,\infty)$, and the default value is 2.
weight: either a number, or a vector of length $n$. If it is a number, then same weight is applied to each points of X. If it is a vector, weight represents weights of each points of X. The default value is 1.

Author

Jisu Kim and Fabrizio Lecci

Details

See (Chazal, Cohen-Steiner, and Merigot, 2011, Definition 3.2) and (Chazal, Massart, and Michel, 2015, Equation (2)) for a formal definition of the "distance to measure" function.

References

Chazal F, Cohen-Steiner D, Merigot Q (2011). "Geometric inference for probability measures." Foundations of Computational Mathematics 11.6, 733-751.

Chazal F, Massart P, Michel B (2015). "Rates of convergence for robust geometric inference."

Chazal F, Fasy BT, Lecci F, Michel B, Rinaldo A, Wasserman L (2014). "Robust Topological Inference: Distance-To-a-Measure and Kernel Distance." Technical Report.

Examples

Run this code

## Generate Data from the unit circle
n <- 300
X <- circleUnif(n)

## Construct a grid of points over which we evaluate the function
by <- 0.065
Xseq <- seq(-1.6, 1.6, by = by)
Yseq <- seq(-1.7, 1.7, by = by)
Grid <- expand.grid(Xseq, Yseq)

## distance to measure
m0 <- 0.1
DTM <- dtm(X, Grid, m0)

Run the code above in your browser using DataLab