multicast
downloads the Multi-CAST annotation data from the servers of
the Language Archive Cologne (LAC) and outputs it as a
data.table
. The Multi-CAST collection is amenable
to extension by additional data sets and annotation schemes. In the spirit of
scientific accountability and reproducability, multicast
takes an
optional argument to select earlier versions of the annotation data.
multicast(vkey)
A numeric or character vector of length 1 specifying the
requested version of the annotation values. Must be one of the four-digit
version keys in the first column of mcindex
, or empty. If
empty, multicast
defaults to the most recent version of the
annotations.
A data.table
with eleven columns:
[, 1] corpus
The name of the corpus.
[, 2] file
The title of the file. A single long corpus text may be split into multiple files.
[, 3] uid
The utterance identifier. Uniquely identifies an utterance within a file.
[,
4] word
Grammatical words. The tokenized utterances in the object language.
[, 5] gloss
Morphological glosses following the Leipzig Glossing Rules.
[, 6] graid
Annotations using the GRAID scheme (Haig & Schnell 2014).
[, 7] gform
The form symbol of a GRAID gloss.
[, 8] ganim
The person-animacy symbol of a GRAID gloss.
[, 9] gfunc
The function symbol of a GRAID gloss.
[, 10] refind
Referent tracking using the RefIND scheme (Schiborr et al. 2018).
[, 11] reflex
The information status of newly introduced referents, using a simplified version of the RefLex scheme (Riester & Baumann 2017).
The Multi-CAST annotation data accessed by the
multicast
method is published under a Create Commons
Attribution 4.0 International (CC-BY 4.0) licence
(https://creativecommons.org/licenses/by-sa/4.0/). Please refer to
the collection documentation for information on how to give proper credit
to its contributors.
Data from the Multi-CAST collection should be cited as:
Haig, Geoffrey & Schnell, Stefan (eds.). 2018[2015]. Multi-CAST: Multilinguial Corpus of Annotated Spoken Texts. (https://lac.uni-koeln.de/en/multicast/) (Accessed date.)
If for some reason you need to cite this package on its
own, please refer to citation(multicastR)
.
Riester, Arndt & Baumann, Stefan. 2017. The RefLex scheme -- Annotation guidelines. (SinSpeC: Working papers of the SFB 732, 14.) Stuttgart: University of Stuttgart. (http://elib.uni-stuttgart.de/handle/11682/9028) (Accessed 2018-03-14.)
Schiborr, Nils N. & Schnell, Stefan & Thiele, Hanna. 2018. RefIND -- Referent Indexing in Natural-language Discourse: Annotation guidelines. Version 1.0. Unpublished Manuscript. Bamberg / Melbourne: University of Bamberg / University of Melbourne.
# NOT RUN {
# retrieve and print the most recent version of the
# Multi-CAST annotations
multicast()
# retrieve and print the version of the annotation data
# published in June 2016
multicast(1606) # or: multicast("1606")
# }
Run the code above in your browser using DataLab