multicast downloads the Multi-CAST annotation data from the servers of
the University of Bamberg and outputs them as a
data.table. As the Multi-CAST collection is
amenable to extension by additional data sets and annotation schemes,
multicast takes an optional argument to select earlier versions of the
annotation data to ensure scientific accountability and reproducability.
multicast(vkey, legacy.colnames = FALSE)A numeric or character vector of length 1 specifying the
requested version of the annotation values. Must be one of the four-digit
version keys in the first column of mc_index, or empty. If
empty or no value is supplied, multicast automatically retrieves the
most recent version of the annotations. See the examples below for an
illustration.
If TRUE, renames the text and
gword columns to what they were called prior to version 1.1.0 of the
package (i.e. file, word). This option will be removed in the
future.
A data.table with eleven columns:
[, 1] corpusThe name of the corpus.
[, 2] textThe title of the text. If legacy.colnames
is TRUE, this column is named file instead.
[,
3] uidThe utterance identifier. Uniquely identifies an utterance within a text.
[, 4] gwordGrammatical words. The tokenized
utterances in the object language. If legacy.colnames is
TRUE, this column is named word instead.
[, 5]
glossMorphological glosses following the Leipzig Glossing Rules.
[, 6] graidAnnotations using the GRAID scheme (Haig & Schnell 2014).
[, 7] gformThe form symbol of a GRAID gloss.
[, 8] ganimThe person-animacy symbol of a GRAID gloss.
[, 9] gfuncThe function symbol of a GRAID gloss.
[, 10] refindReferent tracking using the RefIND scheme (Schiborr et al. 2018).
[, 11] reflexThe information status of newly introduced referents, using a simplified version of the RefLex scheme (Riester & Baumann 2017).
The Multi-CAST annotation data accessed by the
multicast method is published under a Create Commons
Attribution 4.0 International (CC-BY 4.0) licence
(https://creativecommons.org/licenses/by-sa/4.0/). Please refer to
the collection documentation for information on how to give proper credit
to its contributors.
Data from the Multi-CAST collection should be cited as:
Haig, Geoffrey & Schnell, Stefan (eds.). 2015. Multi-CAST: Multilinguial Corpus of Annotated Spoken Texts. (https://multicast.aspra.uni-bamberg.de/) (Accessed date.)
If for some reason you need to cite this package on its
own, please refer to citation(multicastR).
Riester, Arndt & Baumann, Stefan. 2017. The RefLex scheme -- Annotation guidelines. SinSpeC: Working papers of the SFB 732 14. (https://dx.doi.org/10.18419/opus-9011))
Schiborr, Nils N. & Schnell, Stefan & Thiele, Hanna. 2018. RefIND -- Referent Indexing in Natural-language Discourse: Annotation guidelines. Version 1.1. (https://multicast.aspra.uni-bamberg.de/#annotations)
# NOT RUN {
# retrieve and print the most recent version of the
# Multi-CAST annotations
multicast()
# retrieve and print the version of the annotation data
# published in May 2019
multicast(1905) # or: multicast("1905")
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab