Learn R Programming

NLP (version 0.1-6)

Annotator: Annotator objects

Description

Create annotator objects.

Usage

Annotator(f, description = NULL, classes = NULL)

Arguments

f
an annotator function, which must have formals s and a giving, respectively, the string with the natural language text to annotate and an annotation object to start from, and return an annotation object with the compu
description
a character string describing the annotator, or NULL (default).
classes
a character vector or NULL (default) giving classes to be used for the created annotator object in addition to "Annotator".

Value

  • An annotator object inheriting from the given classes and class "Annotator".

Details

Annotator() checks that the given annotator function has the appropriate formals, and returns an annotator object which inherits from the given classes and "Annotator", and contains the given description (currently, as an attribute) to be used in the print() method for such objects.

See Also

Simple annotator generators for creating simple annotator objects based on function performing simple basic NLP tasks.

Package StanfordCoreNLP available from the repository at http://datacube.wu.ac.at which provides generators for annotator pipelines based on the Stanford CoreNLP tools.

Examples

Run this code
## Use blankline_tokenizer() for a simple paragraph token annotator:
para_token_annotator <-
Annotator(function(s, a = Annotation()) {
              spans <- blankline_tokenizer(s)
              n <- length(spans)
              ## Need n consecutive ids, starting with the next "free"
              ## one:
              from <- next_id(a$id)
              Annotation(seq(from = from, length.out = n),
                         rep.int("paragraph", n),
                         spans$start,
                         spans$end)
          },
          "A paragraph token annotator based on blankline_tokenizer().")
para_token_annotator
## Alternatively, use Simple_Para_Token_Annotator().

## A simple text with two paragraphs:
s <- String(paste("First sentence.  Second sentence.  ",
                  "Second paragraph.  ",
                  sep = ""))
a <- annotate(s, para_token_annotator)
## Annotations for paragraph tokens.
a
## Extract paragraph tokens.
s[a]

Run the code above in your browser using DataLab