multicast
downloads the Multi-CAST annotation data from the servers of
the University of Bamberg and outputs them as a
data.table
. As the Multi-CAST collection is
amenable to extension by additional data sets and annotation schemes,
multicast
takes an optional argument vkey
to select earlier
versions of the annotation data to ensure scientific accountability and
reproducibility.
multicast(vkey)
A numeric or character vector of length 1 specifying the
requested version of the annotation values. Must be one of the four-digit
version keys in the first column of mc_index
, or empty. If
empty or no value is supplied, the most recent version of the annotations
is retrieved automatically. See the examples below for an illustration.
A data.table
with eleven columns:
[, 1] corpus
The name of the corpus.
[, 2] text
The title of the text. If legacy.colnames
is TRUE
, this column is named file
instead.
[,
3] uid
The utterance identifier. Uniquely identifies an utterance within a text.
[, 4] gword
Grammatical words. The tokenized
utterances in the object language. If legacy.colnames
is
TRUE
, this column is named word
instead.
[, 5]
gloss
Morphological glosses following the Leipzig Glossing Rules.
[, 6] graid
Annotations using the GRAID scheme (Haig & Schnell 2014).
[, 7] gform
The form symbol of a GRAID gloss.
[, 8] ganim
The person-animacy symbol of a GRAID gloss.
[, 9] gfunc
The function symbol of a GRAID gloss.
[, 10] refind
Referent tracking using the RefIND scheme (Schiborr et al. 2018).
[, 11] isnref
Annotations of the information status of newly introduced referents with ISNRef, a simplified version of the RefLex scheme (Riester & Baumann 2017).
The Multi-CAST annotation data accessed by the
multicast
method is published under a Create Commons
Attribution 4.0 International (CC-BY 4.0) licence
(https://creativecommons.org/licenses/by-sa/4.0/). Please refer to
the collection documentation for information on how to give proper credit
to its contributors.
Data from the Multi-CAST collection should be cited as:
Haig, Geoffrey & Schnell, Stefan (eds.). 2015. Multi-CAST: Multilingual Corpus of Annotated Spoken Texts. (https://multicast.aspra.uni-bamberg.de/) (Accessed date.)
If you need to cite this package specifically, please refer to
citation(multicastR)
.
Riester, Arndt & Baumann, Stefan. 2017. The RefLex scheme -- Annotation guidelines. SinSpeC: Working papers of the SFB 732 14. (https://dx.doi.org/10.18419/opus-9011))
Schiborr, Nils N. & Schnell, Stefan & Thiele, Hanna. 2018. RefIND -- Referent Indexing in Natural-language Discourse: Annotation guidelines. Version 1.1. (https://multicast.aspra.uni-bamberg.de/#annotations)
# NOT RUN {
# retrieve and print the most recent version of the
# Multi-CAST annotations
multicast()
# retrieve the version of the annotation data published
# in May 2019
multicast(1905) # or: multicast("1905")
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab