summarise_intermediate_results_dplyr

Compute the mean of intermediate results created by
<code>compute_intermediate_results</code>. Variant with dplyr based internals
rather than collapse internals.

Perform evaluation of automatic subject
indexing methods. The main focus of the package is to enable efficient
computation of set retrieval and ranked retrieval metrics across multiple
dimensions of a dataset, e.g. document strata or subsets of the label set.
The package also provides the possibility of computing bootstrap confidence
intervals for all major metrics, with seamless integration of parallel
computation and propensity scored variants of standard metrics.

Maximilian Kähler

casimir

Comparing Automated Subject Indexing Methods in R

Markus Schumacher

Deutsche Nationalbibliothek 

summarise_intermediate_results_dplyr function

<dl><dt>intermediate_results</dt>
<dd>As produced by
<code>compute_intermediate_results</code>. This requires a list containing:<ul>
<li><code>results_table</code> A data.frame with columns <code>"prec",
 "rprec", "rec", "f1"</code>.</li>
<li><code>grouping_var</code> A character vector of variables to group by.</li>
</ul></dd>
<dt>propensity_scored</dt>
<dd>Logical, whether to use propensity scores as
weights.</dd>
<dt>label_distribution</dt>
<dd>Expects a data.frame with columns <code>"label_id",
 "label_freq", "n_docs"</code>. <code>label_freq</code> corresponds to the number of
occurences a label has in the gold standard. <code>n_docs</code> corresponds to
the total number of documents in the gold standard.</dd></dl>

Arguments

Compute the mean of intermediate results — summarise_intermediate_results_dplyr

<dl>

<dt>intermediate_results</dt>
<dd>As produced by
<code>compute_intermediate_results</code>. This requires a list containing:<ul>
<li><code>results_table</code> A data.frame with columns <code>"prec",
 "rprec", "rec", "f1"</code>.</li>
<li><code>grouping_var</code> A character vector of variables to group by.</li>
</ul></dd>


<dt>propensity_scored</dt>
<dd>Logical, whether to use propensity scores as
weights.</dd>


<dt>label_distribution</dt>
<dd>Expects a data.frame with columns <code>"label_id",
 "label_freq", "n_docs"</code>. <code>label_freq</code> corresponds to the number of
occurences a label has in the gold standard. <code>n_docs</code> corresponds to
the total number of documents in the gold standard.</dd>

</dl>

summarise_intermediate_results_dplyr: Compute the mean of intermediate results

Description

Usage

Value

Arguments