Learn R Programming

CASIMiR: Comparing Automated Subject Indexing Methods in R

CASIMiR is a toolbox to facilitate comparative analysis of automated subject indexing methods in R.

Why should you use CASIMiR?

Certainly you are able to compute your F-score, precision and recall metrics with your favourite metric function in scikit-learn or other ML libraries. But does that really help in understanding the quality of your favourite subject indexing method? If method $A$ scores 0.4 in F-score and method $B$ scores 0.41, does it mean $B$ is better than $A$? Maybe yes. But likely there are many nuances in the results you miss by looking at overall score functions. Did you know: the quality of subject suggestions may vary considerably among subject groups! It may also strongly depend on the amount of training material per subject term. Here comes CASIMiR: it will help you in a detailed drill-down analysis of your results. In addition, CASIMiR offers advanced metric functions, such as area under the precision-recall curve, NDCG, graded relevance metrics and propensity scored metrics. Last but not least: CASIMiR allows to compute metrics with confidence intervals, based on bootstrap methods. Thus, it will also help you estimate the uncertainty in your results due to the possibly limited size of your test sample.

Why R?

Mainly due to the authors' love for R, but here are some reasons that might convince other people:

  • R's user-friendly capabilities for data analysis with the tidyverse packages
  • professional visualisation with ggplot2
  • seamless handling of grouped data structures
  • efficient data wrangling libraries, such as collapse and dplyr, which are the backbone of CASIMiR
  • the wonderful and inclusive R community

Installation instructions

Install a stable development version from GitHub (requires compilation)

remotes::install_github("deutsche-nationalbibliothek/casimir")

Getting Started

Most functions expect at least two inputs: gold_standard and predicted. Both are expected to be data.frames with subject suggestions in a long format.

Example table for gold standard or predictions:

doc_idlabel_id
1A
1B
2A
3C

For ranked retrieval metrics, i.e. metrics taking into account an ranking of the subject suggestions based on some score, the input format also expects an additional score column:

doc_idlabel_idscore
1A0.73
1B0.15
2A0.92
3C0.34
res <- compute_set_retrieval_scores(
  gold_standard = dnb_gold_standard,
  predictions = dnb_test_predictions
)

head(res)

Acknowledgments

This work was created within the DNB AI Project. The project was funded by Federal Government Commissioner for Culture and the Media as part of the national AI strategy.

Copy Link

Version

Install

install.packages('casimir')

Monthly Downloads

188

Version

0.3.3

License

MIT + file LICENSE

Maintainer

Maximilian Kähler

Last Published

November 17th, 2025

Functions in casimir (0.3.3)

compute_ranked_retrieval_scores

Compute ranked retrieval scores
create_comparison

Join gold standard and predicted results
dnb_gold_standard

DNB gold standard data for computing evaluation metrics
compute_set_retrieval_scores

Compute multi-label metrics
generate_pr_auc_replica

Compute bootstrap replica of pr auc
helper_f_dplyr

Calculate bootstrapping results for one sample
dnb_test_predictions

DNB test predictions for computing evaluation metrics
helper_f

Calculate bootstrapping results for one sample
generate_replicate_results

Compute bootstrapping results
join_propensity_scores

Join propensity scores
lrap_score

Helper function for document-wise computation of ranked retrieval scores
options

casimir Options
pr_curve_post_processing

Postprocessing of pr curve data
ndcg_score

Helper function for document-wise computation of ranked retrieval scores
summarise_intermediate_results

Compute the mean of intermediate results
find_ps_rprec_deno

Compute the denominator for R-precision
option_params

Declaration of options to be used as identical function arguments
summarise_intermediate_results_dplyr

Compute the mean of intermediate results
process_cost_fp

Process cost for false positives
rename_metrics

Rename metrics
set_grouping_var

Set grouping variables
set_ps_flags

Set flags for propensity scores
apply_threshold

Filter predictions based on score and rank
compute_intermediate_results

Compute intermediate set retrieval results per group
check_repair_relevance_pred

Check for inconsistent relevance values
casimir-package

casimir: Comparing Automated Subject Indexing Methods in R
compute_intermediate_results_rr

Compute intermediate ranked retrieval results per group
check_id_vars

Coerce id columns to character
check_repair_relevance_compare

Check for inconsistent relevance values
check_id_vars_col

Coerce column to character
boot_worker_fn

Compute bootstrap replica of pr auc
compute_pr_auc

Compute area under precision-recall curve
dcg_score

Helper function for document-wise computation of ranked retrieval scores
dnb_label_distribution

DNB label distribution for computing propensity scored metrics
compute_propensity_scores

Compute inverse propensity scores
create_rank_col

Create a rank column
compute_pr_auc_from_curve

Compute area under precision-recall curve
compute_pr_curve

Compute precision-recall curve