Unlimited learning, half price | 50% off
Get 50% off unlimited learning

IDmining (version 1.0.7)

MINDID: The (Multipoint) Morisita Index for Intrinsic Dimension Estimation

Description

Estimates the intrinsic dimension of data using the Morisita estimator of intrinsic dimension.

Usage

MINDID(X, scaleQ=1:5, mMin=2, mMax=2)

Arguments

X

A N×E matrix, data.frame or data.table where N is the number of data points and E is the number of variables (or features). Each variable is rescaled to the [0,1] interval by the function.

scaleQ

A vector (at least two values). It contains the values of 1 chosen by the user (by default: scaleQ = 1:5).

mMin

The minimum value of m (by default: mMin = 2).

mMax

The maximum value of m (by default: mMax = 2).

Value

A list of two elements:

  1. a data.frame containing the ln value of the m-Morisita index for each value of ln(δ) and m. The values of ln(δ) are provided with regard to the [0,1] interval.

  2. a data.frame containing the values of Sm and Mm for each value of m.

Details

  1. is the edge length of the grid cells (or quadrats). Since the variables (and consenquently the grid) are rescaled to the [0,1] interval, is equal to 1 for a grid consisting of only one cell.

  2. 1 is the number of grid cells (or quadrats) along each axis of the Euclidean space in which the data points are embedded.

  3. 1 is equal to Q(1/E) where Q is the number of grid cells and E is the number of variables (or features).

  4. 1 is directly related to δ (see References).

  5. δ is the diagonal length of the grid cells.

References

J. Golay and M. Kanevski (2015). A new estimator of intrinsic dimension based on the multipoint Morisita index, Pattern Recognition 48 (12):4070<U+2013>4081.

J. Golay, M. Leuenberger and M. Kanevski (2017). Feature selection for regression problems based on the Morisita estimator of intrinsic dimension, Pattern Recognition 70:126<U+2013>138.

J. Golay and M. Kanevski (2017). Unsupervised feature selection based on the Morisita estimator of intrinsic dimension, Knowledge-Based Systems 135:125-134.

J. Golay, M. Leuenberger and M. Kanevski (2015). Morisita-based feature selection for regression problems. Proceedings of the 23rd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN), Bruges (Belgium).

Examples

Run this code
# NOT RUN {
sim_dat <- SwissRoll(1000)

scaleQ <- 1:15 # It starts with a grid of 1^E cell (or quadrat).
               # It ends with a grid of 15^E cells (or quadrats).
mMI_ID <- MINDID(sim_dat, scaleQ[5:15])

print(paste("The ID estimate is equal to",round(mMI_ID[[1]][1,3],2)))
# }

Run the code above in your browser using DataLab