words(x, ...)
sents(x, ...)
paras(x, ...)
tagged_words(x, ...)
tagged_sents(x, ...)
tagged_paras(x, ...)
chunked_sents(x, ...)
parsed_sents(x, ...)
parsed_paras(x, ...)words(), a character vector with the word tokens in the
document. For sents(), a list of character vectors with the word tokens
in each sentence.
For paras(), a list of lists of character vectors with the word
tokens in each sentence, grouped according to the paragraphs.
For tagged_words(), a character vector with the POS tagged word
tokens in the document (i.e., the word tokens and their POS tags,
separated by /).
For tagged_sents(), a list of character vectors with the POS
tagged word tokens in each sentence.
For tagged_paras(), a list of lists of character vectors with
the POS tagged word tokens in each sentence, grouped according to the
paragraphs.
For chunked_sents(), a list of (flat) Tree
objects giving the chunk trees for each sentence in the document.
For parsed_sents(), a list of Tree
objects giving the parse trees for each sentence in the document.
For parsed_paras(), a list of lists of Tree
objects giving the parse trees for each sentence in the document,
grouped according to the paragraphs in the document.
tagged_words(), tagged_sents() and
tagged_paras()) can optionally provide a mechanism for mapping
the POS tags via a map argument. This can give a function, a
named character vector (with names and elements the tags to map from
and to, respectively), or a named list of such named character
vectors, with names corresponding to POS tagsets (see
Universal_POS_tags_map for an example). If a list, the
map used will be the element with name matching the POS tagset used
(this information is typically determined from the text document
metadata; see the the help pages for text document extension classes
implementing this mechanism for details).TextDocument for basic information on the text document
infrastructure employed by package