Learn R Programming

TDA (version 1.3)

maxPersistence: Maximal Persistence Method

Description

Given a point cloud and a function built on top of the data, we are interested in studying the evolution of the sublevel sets (or superlevel sets) of the function, using persistent homology. The Maximal Persistence Method selects the optimal smoothing parameter of the function, by maximizing the number of significant topological features, or by maximizing the total significant persistence of the features. For each value of the smoothing parameter, this function computes a persistence diagram using gridDiag and returns the values of the two criteria, the dimension of detected features, their persistence, and a bootstrapped confidence band. The features that fall outside of the band are statistically significant. See References.

Usage

maxPersistence(FUN, parameters, X, lim, by, maxdimension=length(lim)/2-1, 
             sublevel = TRUE, library="Dionysus", B=30, alpha=0.05, 
             bandFUN="bootstrapBand", distance="bottleneck", dimension=1, p=1,
             parallel=FALSE, printProgress=FALSE)

Arguments

FUN
the name of a function whose inputs are: 1) X, a $n$ by $d$ matrix of coordinates of the input point cloud, where $d$ is the dimension of the space; 2) a matrix of coordinates of points forming a grid at which the function can be evaluated (n
parameters
a numerical vector, storing a sequence of values for the smoothing paramter of FUN among which maxPersistence will select the optimal ones.
X
a $n$ by $d$ matrix of coordinates of the input point cloud, where $d$ is the dimension of the space.
lim
a $2$ by $d$ matrix, where each column specifying the range of each dimension of the grid, over which the function FUN is evaluated.
by
either a number or a vector of length $d$ specifying space between points of the grid in each dimension. If a number is given, then same space is used in each dimension.
maxdimension
a number that indicates the maximum dimension to compute persistent homology to. Default is $d-1$, which is (dimension of embedding space - 1).
sublevel
a logical variable indicating if the persistent homology should be computed for sublevel sets of FUN (TRUE) or superlevel sets (FALSE). Default is TRUE.
library
User can compute the persistence diagram using either the library 'Dionysus', or 'phat'. Default is 'Dionysus'.
bandFUN
the function to be used in the computation of the confidence band. Either 'bootstrapDiagram' or 'bootstrapBand'.
B
the number of bootstrap iterations.
alpha
for each value store in parameters, maxPersistence computes a (1-alpha) confidence band.
distance
optional (if bandFUN==bootstrapDiagram): a string specifying the distance to be used for persistence diagrams: either 'bottleneck' or 'wasserstein'
dimension
optional (if bandFUN==bootstrapDiagram): an integer specifying the dimension of the features used to compute the bottleneck distance. 0 for connected components, 1 for loops, 2 for voids. Deafault is 1.
p
optional (if bandFUN==bootstrapDiagram AND distance=='wasserstein'): integer specifying the power to be used in the computation of the Wasserstein distance. Default is 1.
parallel
logical: if TRUE the bootstrap iterations are parallelized, using the library parallel. (only if bandFUN=="bootstrapBand")
printProgress
if TRUE a progress bar is printed. Default is FALSE.

Value

  • The function returns an object of the class "maxPersistence", a list with the following components
  • parametersthe same vector parameters given in input
  • sigNumbera numeric vector storing the number of significant features in the persistence diagrams computed using each value in parameters
  • sigPersistencea numeric vector storing the sum of significant persistence of the features in the persistence diagrams, computed using each value in parameters
  • bandsa numeric vector storing the bootstrap band's width, for each value in parameters
  • Persistencea list of the same lenght of parameters. Each element of the list is a $P_i$ by 2 matrix, where $P_i$ is the number of features found using the parameter $i$: the first column stores the dimension of each feature and the second column the persistence abs(death-birth|).

Details

maxPersistence calls the gridDiag function, which computes the persistence diagram of sublevel (or superlevel) sets of a function, evaluated over a grid of points.

References

Frederic Chazal, Jessi Cisewski, Brittany T. Fasy, Fabrizio Lecci, Bertrand Michel, Alessandro Rinaldo, and Larry Wasserman, (2014), "Robust Topological Inference: distance-to-a-measure and kernel distance" Brittany T. Fasy, Fabrizio Lecci, Alessandro Rinaldo, Larry Wasserman, Sivaraman Balakrishnan, and Aarti Singh. (2013), "Statistical Inference For Persistent Homology", (arXiv:1303.7117). To appear, Annals of Statistics.

See Also

gridDiag, kde, kernelDist, dtm, bootstrapBand

Examples

Run this code
## input data: circle with clutter noise
n=600
percNoise=0.1
XX1 = circleUnif(n)
noise=cbind(runif(percNoise*n, -2,2),runif(percNoise*n, -2,2))
X=rbind(XX1,noise)

## limits of the Gird at which the density estimator is evaluated
Xlim=c(-2,2)
Ylim=c(-2,2)
lim=cbind(Xlim,Ylim)
by=0.2

B=80
alpha=0.05

## candidates
parametersKDE=seq(0.1,0.5, by=0.2)

maxKDE=maxPersistence(kde, parametersKDE, X, lim=lim, by=by, 
                  bandFUN="bootstrapBand",B=B, alpha=alpha,
                  parallel=FALSE, printProgress = TRUE)
print(summary(maxKDE))

par(mfrow=c(1,2))
plot(X, pch=16, cex=0.5, main="Circle")
plot(maxKDE)

Run the code above in your browser using DataLab