Learn R Programming

casimir (version 0.3.3)

compute_intermediate_results: Compute intermediate set retrieval results per group

Description

Compute intermediate set retrieval results per group such as number of gold standard and predicted labels, number of true positives, false positives and false negatives, precision, R-precision, recall and F1 score.

Usage

compute_intermediate_results(
  gold_vs_pred,
  grouping_var,
  propensity_scored = FALSE,
  cost_fp = NULL,
  drop_empty_groups = options::opt("drop_empty_groups"),
  check_group_names = options::opt("check_group_names")
)

compute_intermediate_results_dplyr( gold_vs_pred, grouping_var, propensity_scored = FALSE, cost_fp = NULL )

Value

A list of two elements:

  • results_table A data.frame with columns "n_gold", "n_suggested", "tp", "fp", "fn", "prec", "rprec", "rec", "f1".

  • grouping_var The input vector grouping_var.

Arguments

gold_vs_pred

A data.frame with logical columns "suggested", "gold" as produced by create_comparison.

grouping_var

A character vector of grouping variables that must be present in gold_vs_pred (dplyr version requires rlang symbols).

propensity_scored

Logical, whether to use propensity scores as weights.

cost_fp

A numeric value > 0, defaults to NULL.

drop_empty_groups

Should empty levels of factor variables be dropped in grouped set retrieval computation? (Defaults to TRUE, overwritable using option 'casimir.drop_empty_groups' or environment variable 'R_CASIMIR_DROP_EMPTY_GROUPS')

check_group_names

Perform replacement of dots in grouping columns. Disable for faster computation if you can make sure that all columns used for grouping ("doc_id", "label_id", "doc_groups", "label_groups") do not contain dots. (Defaults to TRUE, overwritable using option 'casimir.check_group_names' or environment variable 'R_CASIMIR_CHECK_GROUP_NAMES')

Functions

  • compute_intermediate_results_dplyr(): Variant with dplyr based internals rather than collapse internals.

Examples

Run this code

library(casimir)

gold <- tibble::tribble(
  ~doc_id, ~label_id,
  "A", "a",
  "A", "b",
  "A", "c",
  "B", "a",
  "B", "d",
  "C", "a",
  "C", "b",
  "C", "d",
  "C", "f"
)

pred <- tibble::tribble(
  ~doc_id, ~label_id,
  "A", "a",
  "A", "d",
  "A", "f",
  "B", "a",
  "B", "e",
  "C", "f"
)

gold_vs_pred <- create_comparison(gold, pred)

compute_intermediate_results(gold_vs_pred, "doc_id")

Run the code above in your browser using DataLab