Learn R Programming

Topological Data Analysis: Mapper Algorithm

Try this playground to get familier with the algorithm ! Document will keep update for better understanding the source code.

This package is based on the TDAmapper package by Paul Pearson. You can view the original package here. Since the original package hasn't been updated in over seven years, this version is focused on optimization. By incorporating vector computation into the Mapper algorithm, this package aims to significantly improve its performance.

Get started quickly

Step visualize from Skaf et al.

Mapper is basically a three-step process:

1. Cover: This step splits the data into overlapping intervals and creates a cover for the data.

2. Cluster: This step clusters the data points in each interval the cover creates.

3. Simplicial Complex: This step combines the two steps above, which connects the data points in the cover to create a simplicial complex.

you can know more about the basic here: Chazal, F., & Michel, B. (2021). An introduction to topological data analysis: fundamental and practical aspects for data scientists. Frontiers in artificial intelligence, 4, 667963.

Besides to the steps above, you can find the following code in the package:

  1. Mapper.R: Combining the three steps above
  2. ConvertLevelset.R: Converting a Flat Index to a Multi-index, or vice versa.
  3. EdgeVertices.R This is to find the nodes for plot, not for the Mapper algorithm.

Goals and Updates

Main Goals

  1. Computational Optimization: The current version speeds up computations by 100 times compare to the original code,

and could be faster by using num_cores.

  1. Expanded Clustering Methods: Clustering is a crucial component of the Mapper algorithm.

In addition to hierarchical clustering, Other methods (K-means, DBscan, PAM) were added to this project.

Version 1.0.1: 1. Update more readable structure 2. Added Kmeans, DBscan, and PAM clustering methods for user Version 1.0.2: 1. Parallel computing 2. Fix unused parameter (num_bins_when_clustering) 3. Frontend published using shiny Version 1.0.3: 1. Add plot function 2. Fix frontend code 3. Add document

Example

Mapper <- MapperAlgo(
  filter_values = circle_data[,2:3],
  intervals = 4,
  percent_overlap = 30, 
  methods = "dbscan",
  method_params = list(eps = 0.3, minPts = 5),
  cover_type = 'extension',
  num_cores = 12
  )
MapperPlotter(Mapper, circle_data$circle, circle_data, type = "forceNetwork")

Computation Performance

Figures 3 and 4 illustrate the impact of parallel computing introduced in Version 1.0.2 using the MNIST dataset. Figure 3 visualizes the time taken for different sample sizes when reducing the input to two dimensions using PCA, demonstrating how parallel computing accelerates computation. Figure 4 keeps the sample size fixed while incrementally increasing the number of dimensions in each iteration. It clearly shows that the number of features used in filter functions significantly affects computing time. You can find the code in Performance.R

Stay Updated

I've written some articles on Medium, which you can find here to get familiar with topological data analysis. I'll be continuously updating my work, and I welcome any feedback!

Build And Submit:

This is for the author to submit the package to CRAN.

# outo update RoxygenNote
devtools::document()
devtools::check()
# pack to .tar.gz
devtools::build()
devtools::submit_cran()

Copy Link

Version

Install

install.packages('MapperAlgo')

Monthly Downloads

261

Version

1.0.3

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

ChiChien Wang

Last Published

June 21st, 2025

Functions in MapperAlgo (1.0.3)

mapperEdges

Create Mapper Edges
MapperPlotter

Plot Mapper Result
cluster_cutoff_at_first_empty_bin

Cut the hierarchical clustering tree to define clusters
cover_points

Cover points based on intervals and overlap
MapperAlgo

Mapper Algorithm
to_lsfi

Convert level set multi-index (lsmi) to flat index (lsfi)
mapperVertices

Create Mapper Vertices
find_best_k_for_kmeans

Find the optimal number of clusters for k-means
perform_clustering

Perform clustering within a level set
simplcial_complex

Construct adjacency matrix of the simplicial complex
to_lsmi

Convert level set flat index (lsfi) to multi-index (lsmi)