features: Extract Annotation Features

Description

Conveniently extract features from annotations and annotated plain text documents.

Usage

features(x, type = NULL, simplify = TRUE)

Arguments

an object inheriting from class "Annotation" or "AnnotatedPlainTextDocument".

type

a character vector of annotation types to be used for selecting annotations, or NULL (default) to use all annotations. When selecting, the elements of type will partially be matched against the annotation types.

simplify

a logical indicating whether to simplify feature values to a vector.

Details

features() conveniently gathers all feature tag-value pairs in the selected annotations into a data frame with variables the values for all tags found (using a NULL value for tags without a value). In general, variables will be lists of extracted values. By default, variables where all elements are length one atomic vectors are simplified into an atomic vector of values. The values for specific tags can be extracted by suitably subscripting the obtained data frame.

Examples

Run this code

# NOT RUN {
## Use a pre-built annotated plain text document,
## see ? AnnotatedPlainTextDocument.
doc <- readRDS(system.file("texts", "stanford.rds", package = "NLP"))
## Extract features of all *word* annotations in doc:
x <- features(doc, "word")
## Could also have abbreviated "word" to "w".
x
## Only lemmas:
x$lemma
## Words together with lemmas:
paste(words(doc), x$lemma, sep = "/")
# }

Run the code above in your browser using DataLab