Learn R Programming

dataSDA (version 0.1.8)

interval_distance: Distance Measures for Interval Data

Description

Functions to compute various distance measures between interval-valued observations.

int_dist_all computes all available distance measures at once.

Usage

int_dist(x, method = "euclidean", gamma = 0.5, q = 1, p = 2, ...)

int_dist_matrix(x, method = "euclidean", gamma = 0.5, q = 1, p = 2, ...)

int_pairwise_dist(x, var_name1, var_name2, method = "euclidean", ...)

int_dist_all(x, gamma = 0.5, q = 1)

Value

A distance matrix (class 'dist') or numeric vector

Arguments

x

interval-valued data with symbolic_tbl class, or an array of dimension [n, p, 2]

method

distance method: "GD", "IY", "L1", "L2", "CB", "HD", "EHD", "nEHD", "snEHD", "TD", "WD", "euclidean", "hausdorff", "manhattan", "city_block", "minkowski", "wasserstein", "ichino", "de_carvalho"

gamma

parameter for the Ichino-Yaguchi distance, 0 <= gamma <= 0.5 (default: 0.5)

q

parameter for the Ichino-Yaguchi distance (Minkowski exponent) (default: 1)

p

power parameter for Minkowski distance (default: 2)

...

additional parameters

var_name1

first variable name or column location

var_name2

second variable name or column location

Author

Han-Ming Wu

Details

Available distance methods:

  • GD: Gowda-Diday distance (Gowda & Diday, 1991)

  • IY: Ichino-Yaguchi distance (Ichino, 1988)

  • L1: L1 (midpoint Manhattan) distance

  • L2: L2 (Euclidean midpoint) distance

  • CB: City-Block distance (Souza & de Carvalho, 2004)

  • HD: Hausdorff distance (Chavent & Lechevallier, 2002)

  • EHD: Euclidean Hausdorff distance

  • nEHD: Normalized Euclidean Hausdorff distance

  • snEHD: Span Normalized Euclidean Hausdorff distance

  • TD: Tran-Duckstein distance (Tran & Duckstein, 2002)

  • WD: L2-Wasserstein distance (Verde & Irpino, 2008)

  • euclidean: Euclidean distance on interval centers (same as L2)

  • hausdorff: Hausdorff distance (same as HD)

  • manhattan: Manhattan distance (same as L1)

  • city_block: City-block distance (same as CB)

  • minkowski: Minkowski distance with parameter p

  • wasserstein: Wasserstein distance (same as WD)

  • ichino: Ichino-Yaguchi distance (simplified version)

  • de_carvalho: De Carvalho distance

References

Gowda, K. C., & Diday, E. (1991). Symbolic clustering using a new dissimilarity measure. Pattern Recognition, 24(6), 567-578.

Ichino, M. (1988). General metrics for mixed features. Systems and Computers in Japan, 19(2), 37-50.

Chavent, M., & Lechevallier, Y. (2002). Dynamical clustering of interval data. In Classification, Clustering and Data Analysis (pp. 53-60). Springer.

Tran, L., & Duckstein, L. (2002). Comparison of fuzzy numbers using a fuzzy distance measure. Fuzzy Sets and Systems, 130, 331-341.

Verde, R., & Irpino, A. (2008). A new interval data distance based on the Wasserstein metric.

Kao, C.-H. et al. (2014). Exploratory data analysis of interval-valued symbolic data with matrix visualization. CSDA, 79, 14-29.

See Also

int_dist_matrix int_dist_all int_pairwise_dist

Examples

Run this code
# Using symbolic_tbl format
data(mushroom.int)
d1 <- int_dist(mushroom.int[, 3:4], method = "euclidean")
d2 <- int_dist(mushroom.int[, 3:4], method = "hausdorff")
d3 <- int_dist(mushroom.int[, 3:4], method = "GD")

# Using array format: 4 concepts, 3 variables
x <- array(NA, dim = c(4, 3, 2))
x[,,1] <- matrix(c(1,2,3,4, 5,6,7,8, 9,10,11,12), nrow=4)
x[,,2] <- matrix(c(3,5,6,7, 8,9,10,12, 13,15,16,18), nrow=4)
d4 <- int_dist(x, method = "snEHD")
d5 <- int_dist(x, method = "IY", gamma = 0.3)

Run the code above in your browser using DataLab