Learn R Programming

mlr3measures (version 0.4.0)

jaccard: Jaccard Similarity Index

Description

Measure to compare two or more sets w.r.t. their similarity. For two sets \(A\) and \(B\), it is defined as $$ J(A, B) = \frac{|A \cap B|}{|A \cup B|}. $$ If more than two sets are provided, the mean of all pairwise scores is calculated.

Usage

jaccard(sets, na_value = NaN, ...)

Arguments

sets

(list()) List of character or integer vectors. sets must have at least 2 elements.

na_value

(numeric(1)) Value that should be returned if the measure is not defined for the input (as described in the note). Default is NaN.

...

(any) Additional arguments. Currently ignored.

Value

Performance value as numeric(1).

Meta Information

  • Type: "similarity"

  • Range: \([0, 1]\)

  • Minimize: FALSE

Details

This measure is undefined if two or more sets are empty.

References

Jaccard, Paul (1901). “<U+00C9>tude comparative de la distribution florale dans une portion des Alpes et du Jura.” Bulletin de la Soci<U+00E9>t<U+00E9> Vaudoise des Sciences Naturelles, 37, 547-579. 10.5169/SEALS-266450.

Bommert A, Rahnenf<U+00FC>hrer J, Lang M (2017). “A Multicriteria Approach to Find Predictive and Sparse Models with Stable Feature Selection for High-Dimensional Data.” Computational and Mathematical Methods in Medicine, 2017, 1--18. 10.1155/2017/7907163.

Bommert A, Lang M (2021). “stabm: Stability Measures for Feature Selection.” Journal of Open Source Software, 6(59), 3010. 10.21105/joss.03010.

See Also

Package stabm which implements many more stability measures with included correction for chance.

Other Similarity Measures: phi()

Examples

Run this code
# NOT RUN {
set.seed(1)
sets = list(
  sample(letters[1:3], 1),
  sample(letters[1:3], 2)
)
jaccard(sets)
# }

Run the code above in your browser using DataLab