CoNLLTextDocument(con, encoding = "unknown", meta = list())scan() for details.scan() for details."CoNLLTextDocument" and
"TextDocument".CoNLLTextDocument() assumes a fixed set of 3
columns giving, respectively, the word token and its POS and chunk
tags. The lines are read from the given connection and split into fields
using scan(). From this, a suitable representation of
the provided information is obtained, and returned as a CoNLL text
document object inheriting from classes "CoNLLTextDocument" and
"TextDocument".
There are methods for generics
words(),
sents(),
tagged_words(),
tagged_sents(), and
chunked_sents()
(as well as as.character())
and class "CoNLLTextDocument",
which should be used to access the text in such text document
objects.
The methods for generics
tagged_words() and
tagged_sents()
provide a mechanism for mapping POS tags via the map argument,
see section Details in the help page for
tagged_words() for more information.
The POS tagset used will be inferred from the POS_tagset
metadata element of the CoNLL-style text document.
TextDocument for basic information on the text document
infrastructure employed by package CoNLLTextDocument().