Annotator: Annotator objects

Description

Create annotator objects.

Usage

Annotator(f, description = NULL, classes = NULL)

Arguments

an annotator function, which must have formals s and a giving, respectively, the string with the natural language text to annotate and an annotation object to start from, and return an annotation object with the compu

description

a character string describing the annotator, or NULL (default).

classes

a character vector or NULL (default) giving classes to be used for the created annotator object in addition to "Annotator".

Value

An annotator object inheriting from the given classes and class "Annotator".

Details

Annotator() checks that the given annotator function has the appropriate formals, and returns an annotator object which inherits from the given classes and "Annotator", and contains the given description (currently, as an attribute) to be used in the print() method for such objects.

Examples

Run this code

## Use blankline_tokenizer() for a simple paragraph token annotator:
para_token_annotator <-
Annotator(function(s, a = Annotation()) {
              spans <- blankline_tokenizer(s)
              n <- length(spans)
              ## Need n consecutive ids, starting with the next "free"
              ## one:
              from <- next_id(a$id)
              Annotation(seq(from = from, length.out = n),
                         rep.int("paragraph", n),
                         spans$start,
                         spans$end)
          },
          "A paragraph token annotator based on blankline_tokenizer().")
para_token_annotator
## Alternatively, use Simple_Para_Token_Annotator().

## A simple text with two paragraphs:
s <- String(paste("First sentence.  Second sentence.  ",
                  "Second paragraph.  ",
                  sep = ""))
a <- annotate(s, para_token_annotator)
## Annotations for paragraph tokens.
a
## Extract paragraph tokens.
s[a]