CoNLLTextDocument(con, encoding = "unknown", meta = list())
scan()
for details.scan()
for details."CoNLLTextDocument"
and
"TextDocument"
.CoNLLTextDocument()
assumes a fixed set of 3
columns giving, respectively, the word token and its POS and chunk
tags. The lines are read from the given connection and split into fields
using scan()
. From this, a suitable representation of
the provided information is obtained, and returned as a CoNLL text
document object inheriting from classes "CoNLLTextDocument"
and
"TextDocument"
.
There are methods for generics
words()
,
sents()
,
tagged_words()
,
tagged_sents()
, and
chunked_sents()
(as well as as.character()
)
and class "CoNLLTextDocument"
,
which should be used to access the text in such text document
objects.
TextDocument
for basic information on the text document
infrastructure employed by package CoNLLTextDocument()
.