Dimodal-package: Detection of Multi-Modal Data Using Spacing

Description

The Dimodal package uses the spacing of data, or difference between order statistics, to detect and locate modes or the transition between them. Consistent spacing with stable values appears within a mode, while it increases at the anti-modes. The package contains parametric and non-parametric tests of features detected within the spacing, and can use any changepoint detectors installed on the system as a separate check.

Arguments

Main Interface

The package has three top-level commands.

Dimodal runs the modality analysis. It supports print, summary, and plot methods.
Diopt provides a persistent database of options controlling the feature detectors, tests, and display of results.
Ditrack displays the position and probability of features as filter sizes change, to help in selecting the best size for analysis. It has a plot method to graphically show the results.

Feature Detectors

The package has five feature detectors. They work with any data, not just the spacing.

find.runs identifies fuzzy runs, sequences of nearly-equal numeric values or of equal discrete values or symbols.
find.peaks identifies local extrema, merging small minor peaks into larger.
find.flats identifies flat or consistent stretches of values.
find.cpt is a majority voting scheme using external changepoint algorithms to identify a common set of points where the behavior of data changes.
find.level.sections is an inversion of the modehunt changepoint detector and can be added to the voting list.

Feature Tests

Dimodal includes three groups of tests to evaluate features.

Dipeak.test and Diflat.test are parametric models of the peak and flat distributions after low-pass filtering. Dipeak.critval and Diflat.critval provide critical values of the peak (height) and flat (length) for a significance level.
Dinrun.test and Dirunlen.test are runs-based tests (up, down, equal trends) performed on the signed difference of a signal. They include the Kaplansky-Riordan test of the number of runs and a Markov chain model for the longest run.
Dipermht.test and Diexcurht.test are bootstrap tests simulating a feature from the actual data. They include a permutation test of runs and a general excursion test from the difference of a signal.

Data

The kirkwood dataset has the multi-modal distribution of asteroid orbital radii, where modes identify families of asteroids within the main belt and anti-modes the Kirkwood gaps cleared out by regular perturbation from Jupiter.

Classes

The return value of each command, feature detector, and test is given an S3 class that supports printing, summarizing, and perhaps plotting. Links can be found in the return value section of the command.

Utility Functions

The package includes several functions to help work with the results of the analysis.

midquantile uses piecewise linear segments to convert quantiles of discrete or heavily quantized data back to data.
runs.as.rle converts the find.runs result to the "rle" class.
select.peaks returns just the local maxima from the extrema detected by find.peaks.
center.diw shifts the indices of features in the interval spacing, normally located at the end of the interval, into the center, to align with the low-pass features and actual data.
match.features uses distance and overlap criteria to identify features found in both the low-pass and interval spacing.
shiftID.place moves the results of find.peaks, find.flats, and find.cpt to the original data grid and converts any indices into raw values using the midquantile approximation.

Author

Greg Kreider.

The package compiles by default with the PCG random number generator, written by Melissa O'Neill, for sampling during the excursion tests.

The kirkwood dataset is taken from the Lowell Observatory asteroid ephemeris.