tokenize_sents

This function turns a corpus of texts into a <code>quanteda</code> tokens object of sentences.

Carry out comparative authorship analysis of disputed and undisputed texts within the Likelihood Ratio Framework for expressing evidence in forensic science. This package contains implementations of well-known algorithms for comparative authorship analysis, such as Smith and Aldridge's (2011) Cosine Delta <doi:10.1080/09296174.2011.533591> or Koppel and Winter's (2014) Impostors Method <doi:10.1002/asi.22954>, as well as functions to measure their performance and to calibrate their outputs into Log-Likelihood Ratios.

Andrea Nini

idiolect

Forensic Authorship Analysis

David van Leeuwen

tokenize_sents function

<dl><dt>corpus</dt>
<dd>A <code>quanteda</code> corpus object, typically the output of the <code>create_corpus()</code> function or the output of <code>contentmask()</code>.</dd>
<dt>model</dt>
<dd>The spacy model to use. The default is "en_core_web_sm".</dd></dl>

Arguments

Tokenize to sentences — tokenize_sents

<dl>

<dt>corpus</dt>
<dd>A <code>quanteda</code> corpus object, typically the output of the <code>create_corpus()</code> function or the output of <code>contentmask()</code>.</dd>


<dt>model</dt>
<dd>The spacy model to use. The default is "en_core_web_sm".</dd>

</dl>

tokenize_sents: Tokenize to sentences

Description

Usage

Value

Arguments

Details

Examples