data_corpus_ms2020sample

A corpus of 100 speeches from the Maerz &amp; Schneider (2020) corpus,
balanced across regime types (50 autocracies, 50 democracies). This sample
is included in the package for demos and testing. The full corpus of 4,740
speeches is available in the package's pkgdown examples folder.

data

Tools for AI-assisted qualitative data coding using large language
models ('LLMs') via the 'ellmer' package, supporting providers including
'OpenAI', 'Anthropic', 'Google', 'Azure', and local models via 'Ollama'.
Provides a 'codebook'-based workflow for defining coding instructions and
applying them to texts, images, and other data. Includes built-in 'codebooks'
for common applications such as sentiment analysis and policy coding, and
functions for creating custom 'codebooks' for specific research questions.
Supports systematic replication across models and settings, computing
inter-coder reliability statistics including Krippendorff's alpha
(Krippendorff 2019, <doi:10.4135/9781071878781>) and Fleiss' kappa
(Fleiss 1971, <doi:10.1037/h0031619>), as well as gold-standard validation
metrics including accuracy, precision, recall, and F1 scores following
Sokolova and Lapalme (2009, <doi:10.1016/j.ipm.2009.03.002>). Provides audit
trail functionality for documenting coding workflows following Lincoln and
Guba's (1985, ISBN:0803924313) framework for establishing trustworthiness
in qualitative research.

Seraphine F. Maerz

quallmer

Qualitative Analysis with Large Language Models

Kenneth Benoit

data_corpus_ms2020sample function

A corpus object.
The corpus consists of 100 speeches randomly sampled from 40 heads of
government across 27 countries, balanced by regime type. The corpus
contains the following document-level variables:<dl>
<dt>speaker</dt>
<dd>Character. Name of the head of government.</dd><dt>country</dt>
<dd>Character. Country name.</dd><dt>regime</dt>
<dd>Factor. Regime type: "Democracy" or "Autocracy".</dd><dt>score</dt>
<dd>Numeric. Original dictionary-based liberal-illiberal score.</dd><dt>date</dt>
<dd>Date. Date of the speech.</dd><dt>title</dt>
<dd>Character. Title of the speech.</dd>
</dl>

Format

Sample corpus of political speeches from Maerz &amp; Schneider (2020) — data_corpus_ms2020sample

A corpus object.
The corpus consists of 100 speeches randomly sampled from 40 heads of
government across 27 countries, balanced by regime type. The corpus
contains the following document-level variables:<dl>
<dt>speaker</dt>
<dd>Character. Name of the head of government.</dd>

<dt>country</dt>
<dd>Character. Country name.</dd>

<dt>regime</dt>
<dd>Factor. Regime type: "Democracy" or "Autocracy".</dd>

<dt>score</dt>
<dd>Numeric. Original dictionary-based liberal-illiberal score.</dd>

<dt>date</dt>
<dd>Date. Date of the speech.</dd>

<dt>title</dt>
<dd>Character. Title of the speech.</dd>


</dl>

data_corpus_ms2020sample: Sample corpus of political speeches from Maerz & Schneider (2020)

Description

Usage

Arguments

Format

References

Examples