Learn R Programming

clevr: Clustering and Link Prediction Evaluation in R

clevr implements functions for evaluating link prediction and clustering algorithms in R. It includes efficient implementations of common performance measures, such as:

  • pairwise precision, recall, F-measure;
  • homogeneity, completeness and V-measure;
  • (adjusted) Rand index;
  • variation of information; and
  • mutual information.

While the current focus is on supervised (a.k.a. external) performance measures, unsupervised (internal) measures are also in scope for future releases.

Installation

You can install the latest release from CRAN by entering:

install.packages("clevr")

The development version can be installed from GitHub using devtools:

# install.packages("devtools")
devtools::install_github("cleanzr/clevr")

Example

Several functions are included which transform between different clustering representations.

library(clevr)
# A clustering of four records represented as a membership vector
pred_membership <- c("Record1" = 1, "Record2" = 1, "Record3" = 1, "Record4" = 2)

# Represent as a set of record pairs that appear in the same cluster
pred_pairs <- membership_to_pairs(pred_membership)
print(pred_pairs)
#>      [,1]      [,2]     
#> [1,] "Record1" "Record2"
#> [2,] "Record1" "Record3"
#> [3,] "Record2" "Record3"

# Represent as a list of record clusters
pred_clusters <- membership_to_clusters(pred_membership)
print(pred_clusters)
#> $`1`
#> [1] "Record1" "Record2" "Record3"
#> 
#> $`2`
#> [1] "Record4"

Performance measures are available for evaluating linked pairs:

true_pairs <- rbind(c("Record1", "Record2"), c("Record3", "Record4"))

pr <- precision_pairs(true_pairs, pred_pairs)
print(pr)
#> [1] 0.3333333

re <- recall_pairs(true_pairs, pred_pairs)
print(re)
#> [1] 0.5

and for evaluating clusterings:

true_membership <- c("Record1" = 1, "Record2" = 1, "Record3" = 2, "Record4" = 2)

ari <- adj_rand_index(true_membership, pred_membership)
print(ari)
#> [1] 0

vi <- variation_info(true_membership, pred_membership)
print(vi)
#> [1] 0.8239592

Copy Link

Version

Install

install.packages('clevr')

Monthly Downloads

382

Version

0.1.2

License

GPL-2

Issues

Pull Requests

Stars

Forks

Maintainer

Neil Marchant

Last Published

September 16th, 2023

Functions in clevr (0.1.2)

homogeneity

Homogeneity Between Clusterings
mutual_info

Mutual Information Between Clusterings
v_measure

V-measure Between Clusterings
variation_info

Variation of Information Between Clusterings
balanced_accuracy_pairs

Balanced Accuracy of Linked Pairs
clusters_to_membership

Transform Clustering Representations
canonicalize_pairs

Canonicalize element pairs
contingency_table_clusters

Contingency Table for Clusterings
eval_report_clusters

Evaluation Report for Clustering
adj_rand_index

Adjusted Rand Index Between Clusterings
clevr-package

clevr: Clustering and Link Prediction Evaluation in R
contingency_table_pairs

Binary Contingency Table for Linked Pairs
accuracy_pairs

Accuracy of Linked Pairs
completeness

Completeness Between Clusterings
recall_pairs

Recall of Linked Pairs
precision_pairs

Precision of Linked Pairs
rand_index

Rand Index Between Clusterings
fowlkes_mallows

Fowlkes-Mallows Index Between Clusterings
f_measure_pairs

F-measure of Linked Pairs
eval_report_pairs

Evaluation Report for Linked Pairs
specificity_pairs

Specificity of Linked Pairs
fowlkes_mallows_pairs

Fowlkes-Mallows Index of Linked Pairs