A corpus of sentences sampled from from publicly available party manifestos from the United Kingdom from the 2010 election. Each sentence has been rated in terms of its classification as pertaining to immigration or not and then on a scale of favorability or not toward open immigration policy (as the mean score of crowd coders on a scale of -1 (favours open immigration policy), 0 (neutral), or 1 (anti-immigration).
The sentences were sampled from the corpus used in Benoit et al. (2016) tools:::Rd_expr_doi("10.1017/S0003055416000058"), which contains more information on the crowd-sourced annotation approach.
data_corpus_manifsentsUK2010sampleA corpus object. The corpus consists of 155 sentences randomly sampled from the party manifestos, with an attempt to balance the sentencs according to their categorisation as pertaining to immigration or not, as well as by party. The corpus contains the following document-level variables:
factor; abbreviation of the party that wrote the manifesto.
factor; party that wrote the manifesto.
integer; 4-digit year of the election.
Factor indicating whether the majority of
crowd workers labelled a sentence as referring to immigration or not. The
variable has missing values (NA) for all non-annotated manifestos.
numeric; the direction of statements coded as "Immigration" based on the aggregated crowd codings. The variable is the mean of the scores assigned by workers who coded a sentence and who allocated the sentence to the "Immigration" category. The variable ranges from -1 (Favorable and open immigration policy) to +1 ("Negative and closed immigration policy").
integer; the number of coders who
contributed to the mean score immigration_mean.
integer; a thresholded version of immigration_mean
coded as -1 (pro-immigration, mean < -0.5), 0 (neutral, -0.5 <= mean <= 0.5),
or 1 (anti-immigration, mean > 0.5). Set to NA for non-immigration sentences.
Benoit, K., Conway, D., Lauderdale, B.E., Laver, M., & Mikhaylov, S. (2016). Crowd-sourced Text Analysis: Reproducible and Agile Production of Political Data. American Political Science Review, 100,(2), 278--295. tools:::Rd_expr_doi("10.1017/S0003055416000058")
if (requireNamespace("quanteda", quietly = TRUE)) {
# Inspect the corpus
summary(data_corpus_manifsentsUK2010sample)
}
Run the code above in your browser using DataLab