Learn R Programming

TDAvec (version 0.1.41)

computeStats: Compute Descriptive Statistics for Births, Deaths, Midpoints, and Lifespans in a Persistence Diagram

Description

For a given persistence diagram \(D=\{(b_i,d_i)\}_{i=1}^N\) (corresponding to a specified homological dimension), computeStats() calculates descriptive statistics of the birth, death, midpoint (the average of birth and death), and lifespan (death minus birth) values. Additionally, it computes the total number of points and entropy of the lifespan values. Points in \(D\) with infinite death values are ignored.

Usage

computeStats(D, homDim)

Value

A (named) 38-dimensional numeric vector containing:

  • mean_births, stddev_births, median_births, iqr_births, range_births, p10_births, p25_births, p75_births, p90_births: Descriptive statistics for birth values.

  • mean_deaths, stddev_deaths, median_deaths, iqr_deaths, range_deaths, p10_deaths, p25_deaths, p75_deaths, p90_deaths: Descriptive statistics for death values.

  • mean_midpoints, stddev_midpoints, median_midpoints, iqr_midpoints, range_midpoints, p10_midpoints, p25_midpoints, p75_midpoints, p90_midpoints: Descriptive statistics for midpoint values (mean of birth and death values).

  • mean_lifespans, stddev_lifespans, median_lifespans, iqr_lifespans, range_lifespans, p10_lifespans, p25_lifespans, p75_lifespans, p90_lifespans: Descriptive statistics for lifespan (or persistence) values (difference between death and birth values).

  • total_bars: The total number of points in the specified homological dimension.

  • entropy: The entropy of the lifespan values.

Arguments

D

a persistence diagram: a matrix with three columns containing the homological dimension, birth and death values respectively.

homDim

the homological dimension (0 for \(H_0\), 1 for \(H_1\), etc.). Rows in D are filtered based on this value.

Author

Umar Islambekov

Details

The function extracts rows from D where the first column equals homDim, and computes the mean, standard deviation, median, IQR (interquartile range), range, 10th, 25th, 75th and 90th percentiles of the birth, death, midpoint, lifespan (or persistence) values; the total number of bars (or points in the diagram) and the entropy of the lifespan values (-\(\sum_{i=1}^N\frac{l_i}{L}\log_2(\frac{l_i}{L})\), where \(l_i=d_i-b_i\) (lifespan) and \(L=\sum_{i=1}^N l_i\)). If D does not contain any points corresponding to homDim, a vector of zeros is returned.

References

1. Ali, D., Asaad, A., Jimenez, M.J., Nanda, V., Paluzo-Hidalgo, E. and Soriano-Trigueros, M., (2023). A survey of vectorization methods in topological data analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence.

Examples

Run this code
N <- 100 # The number of points to sample

set.seed(123) # Set a random seed for reproducibility

# Sample N points uniformly from the unit circle and add Gaussian noise
theta <- runif(N, min = 0, max = 2 * pi)
X <- cbind(cos(theta), sin(theta)) + rnorm(2 * N, mean = 0, sd = 0.2)

# Compute the persistence diagram using the Rips filtration built on top of X
# The 'threshold' parameter specifies the maximum distance for building simplices
D <- TDAstats::calculate_homology(X, threshold = 2)

# Compute statistics for homological dimension H_0
computeStats(D, homDim = 0)

# Compute statistics for homological dimension H_1
computeStats(D, homDim = 1)

Run the code above in your browser using DataLab