sets (version 1.0-18)

similarity: Similarity and Dissimilarity Functions

Description

Similarities and dissimilarities for (generalized) sets.

Usage

set_similarity(x, y, method = "Jaccard")
gset_similarity(x, y, method = "Jaccard")
cset_similarity(x, y, method = "Jaccard")

set_dissimilarity(x, y, method = c("Jaccard", "Manhattan", "Euclidean", "L1", "L2")) gset_dissimilarity(x, y, method = c("Jaccard", "Manhattan", "Euclidean", "L1", "L2")) cset_dissimilarity(x, y, method = c("Jaccard", "Manhattan", "Euclidean", "L1", "L2"))

Arguments

x, y

Two (generalized/customizable) sets.

method

Character string specifying the proximity method (see below).

Value

A numeric value (similarity or dissimilarity, as specified).

Details

For two generalized sets \(X\) and \(Y\), the Jaccard similarity is \(|X \cap Y| / |X \cup Y|\) where \(|\cdot|\) denotes the cardinality for generalized sets (sum of memberships). The Jaccard dissimilarity is 1 minus the similarity.

The L1 (or Manhattan) and L2 (or Euclidean) dissimilarities are defined as follows. For two fuzzy multisets \(A\) and \(B\) on a given universe \(X\) with elements \(x\), let \(M_A(x)\) and \(M_B(x)\) be functions returning the memberships of an element \(x\) in sets \(A\) and \(B\), respectively. The memberships are returned in standard form, i.e. as an infinite vector of decreasing membership values, e.g. \((1, 0.3, 0, 0, \dots)\). Let \(M_A(x)_i\) and \(M_B(x)_i\) denote the \(i\)th components of these membership vectors. Then the L1 distance is defined as: $$d_1(A, B) = \sum_{x \in X}\sum_{i=1}{\infty}|M_A(x)_i - M_B(x)_i|$$ and the L2 distance as: $$d_2(A, B) = \sqrt{\sum_{x \in X}\sum_{i=1}{\infty}|M_A(x)_i - M_B(x)_i|^2}$$

See Also

set.

Examples

Run this code
# NOT RUN {
A <- set("a", "b", "c")
B <- set("c", "d", "e")
set_similarity(A, B)
set_dissimilarity(A, B)

A <- gset(c("a", "b", "c"), c(0.3, 0.7, 0.9))
B <- gset(c("c", "d", "e"), c(0.2, 0.4, 0.5))
gset_similarity(A, B, "Jaccard")
gset_dissimilarity(A, B, "Jaccard")
gset_dissimilarity(A, B, "L1")
gset_dissimilarity(A, B, "L2")

A <- gset(c("a", "b", "c"), list(c(0.3, 0.7), 0.1, 0.9))
B <- gset(c("c", "d", "e"), list(0.2, c(0.4, 0.5), 0.8))
gset_similarity(A, B, "Jaccard")
gset_dissimilarity(A, B, "Jaccard")
gset_dissimilarity(A, B, "L1")
gset_dissimilarity(A, B, "L2")
# }

Run the code above in your browser using DataLab