cleanNLP (version 1.10.0)

to_CoNNL: Returns a CoNLL-U Document

Description

Given an annotation object, this function returns a CoNLL-U document. The format of CoNLL-U is close to that of a deliminated file, but includes blank lines to signal breaks between sentences. We return a string object that can be saved to disk using the function readLines. Note that CoNLL-U does not have a way of distinguishing documents. Usually only one document is written to a single file. If you want this behavior, see the examples. Also note that this is a lossy procedure depending on the annotations available, saving just tokenization, lemmatization, part of speech tags, and dependencies.

Usage

to_CoNNL(anno)

Arguments

anno

annotation object to convert

Value

an annotation object with a single document

Examples

Run this code
# NOT RUN {
for (i in get_document(obama)$id) {
  anno <- extract_documents(obama, i)
  conll <- to_CoNNL(anno)
  writeLines(conll, sprintf("%02d.conll", i))
}
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab