The function `dtm`

computes the "distance to measure function" on a set of points `Grid`

, using the uniform empirical measure on a set of points `X`

. Given a probability measure \(P\), The distance to measure function, for each \(y \in R^d\), is defined by
$$
d_{m0}(y) = \left(\frac{1}{m0}\int_0^{m0} ( G_y^{-1}(u))^{r} du\right)^{1/r},
$$
where \(G_y(t) = P( \Vert X-y \Vert \le t)\), and \(m0 \in (0,1)\) and \(r \in [1,\infty)\) are tuning parameters. As `m0`

increases, DTM function becomes smoother, so `m0`

can be understood as a smoothing parameter. `r`

affects less but also changes DTM function as well. The DTM can be seen as a smoothed version of the distance function. See Details and References.

Given \(X=\{x_1, \dots, x_n\}\), the empirical version of the distance to measure is $$ \hat d_{m0}(y) = \left(\frac{1}{k} \sum_{x_i \in N_k(y)} \Vert x_i-y \Vert^{r}\right)^{1/r}, $$ where \(k= \lceil m0 * n \rceil\) and \(N_k(y)\) is the set containing the \(k\) nearest neighbors of \(y\) among \(x_1, \ldots, x_n\).

`dtm(X, Grid, m0, r = 2, weight = 1)`

The function `dtm`

returns a vector of length \(m\) (the number of points stored in `Grid`

) containing the value of the distance to measure function evaluated at each point of `Grid`

.

- X
an \(n\) by \(d\) matrix of coordinates of points used to construct the uniform empirical measure for the distance to measure, where \(n\) is the number of points and \(d\) is the dimension.

- Grid
an \(m\) by \(d\) matrix of coordinates of points where the distance to measure is computed, where \(m\) is the number of points in

`Grid`

and \(d\) is the dimension.- m0
a numeric variable for the smoothing parameter of the distance to measure. Roughly,

`m0`

is the the percentage of points of`X`

that are considered when the distance to measure is computed for each point of`Grid`

. The value of`m0`

should be in \((0,1)\).- r
a numeric variable for the tuning parameter of the distance to measure. The value of

`r`

should be in \([1,\infty)\), and the default value is`2`

.- weight
either a number, or a vector of length \(n\). If it is a number, then same weight is applied to each points of

`X`

. If it is a vector,`weight`

represents weights of each points of`X`

. The default value is`1`

.

Jisu Kim and Fabrizio Lecci

See (Chazal, Cohen-Steiner, and Merigot, 2011, Definition 3.2) and (Chazal, Massart, and Michel, 2015, Equation (2)) for a formal definition of the "distance to measure" function.

Chazal F, Cohen-Steiner D, Merigot Q (2011). "Geometric inference for probability measures." Foundations of Computational Mathematics 11.6, 733-751.

Chazal F, Massart P, Michel B (2015). "Rates of convergence for robust geometric inference."

Chazal F, Fasy BT, Lecci F, Michel B, Rinaldo A, Wasserman L (2014). "Robust Topological Inference: Distance-To-a-Measure and Kernel Distance." Technical Report.

`kde`

, `kernelDist`

, `distFct`

```
## Generate Data from the unit circle
n <- 300
X <- circleUnif(n)
## Construct a grid of points over which we evaluate the function
by <- 0.065
Xseq <- seq(-1.6, 1.6, by = by)
Yseq <- seq(-1.7, 1.7, by = by)
Grid <- expand.grid(Xseq, Yseq)
## distance to measure
m0 <- 0.1
DTM <- dtm(X, Grid, m0)
```

Run the code above in your browser using DataCamp Workspace