most_similar

Select the most similar texts to a specific text

Carry out comparative authorship analysis of disputed and undisputed texts within the Likelihood Ratio Framework for expressing evidence in forensic science. This package contains implementations of well-known algorithms for comparative authorship analysis, such as Smith and Aldridge's (2011) Cosine Delta <doi:10.1080/09296174.2011.533591> or Koppel and Winter's (2014) Impostors Method <doi:10.1002/asi.22954>, as well as functions to measure their performance and to calibrate their outputs into Log-Likelihood Ratios.

Andrea Nini

idiolect

Forensic Authorship Analysis

David van Leeuwen

most_similar function

<dl><dt>sample</dt>
<dd>This is a single row of a <code>quanteda</code> dfm representing the sample to match.</dd>
<dt>pool</dt>
<dd>This is a dfm containing all possible samples from which to select the top n.</dd>
<dt>coefficient</dt>
<dd>The coefficient to use for similarity. Either "minmax", "cosine", or "Phi".</dd>
<dt>n</dt>
<dd>The number of rows to extract from the pool of potential samples.</dd></dl>

Arguments

Select the most similar texts to a specific text — most_similar

<dl>

<dt>sample</dt>
<dd>This is a single row of a <code>quanteda</code> dfm representing the sample to match.</dd>


<dt>pool</dt>
<dd>This is a dfm containing all possible samples from which to select the top n.</dd>


<dt>coefficient</dt>
<dd>The coefficient to use for similarity. Either "minmax", "cosine", or "Phi".</dd>


<dt>n</dt>
<dd>The number of rows to extract from the pool of potential samples.</dd>

</dl>

most_similar: Select the most similar texts to a specific text

Description

Usage

Value

Arguments

Examples