Learn R Programming

NLPclient (version 1.0)

StanfordCoreNLP_Pipeline: Stanford CoreNLP annotator pipeline

Description

Create a Stanford CoreNLP annotator pipeline.

Usage

StanfordCoreNLP_Pipeline(annotators = c("pos", "lemma"),
  language = "en", control = list(), port = 9000L,
  host = "localhost")

Arguments

annotators

a character string specifying the annotators to be used in addition to ‘ssplit’ (sentence token annotation) and ‘tokenize’ (word token annotations), with elements "pos" (POS tagging), "lemma" (lemmatizing), "ner" (named entity recognition), "regexner" (rule-based named entity recognition over token sequences using Java regular expressions), "parse" (constituency parsing), "depparse" (dependency parsing), "sentiment" (sentiment analysis), "coref" (coference resolution), "dcoref" (deterministic coference resolution), "cleanxml" (clean XML tags), or "relation" (relation extraction), or unique abbreviations thereof. Ignored for languages other than English.

language

a character string giving the ISO-639 code of the language being processed by the annotator pipeline.

control

a named or empty (default) list vector with annotator control options, with the names giving the option names. See https://stanfordnlp.github.io/CoreNLP/annotators.html for available control options.

port

an integer giving the port (default is 9000L).

host

a character string giving the hostname of the server.

Value

An Annotator object providing the annotator pipeline.

See Also

https://stanfordnlp.github.io/CoreNLP/ for more information about the Stanford CoreNLP tools.

Examples

Run this code
# NOT RUN {
require("NLP")
s <- as.String(paste("Stanford University is located in California.",
                     "It is a great university."))
s

## Annotators: ssplit, tokenize:
if ( ping_nlp_client() == "pong" ) {
p <- StanfordCoreNLP_Pipeline(NULL)
a <- p(s)
a

## Annotators: ssplit, tokenize, pos, lemma (default):
p <- StanfordCoreNLP_Pipeline()
a <- p(s)
a

## Equivalently:
annotate(s, p)

## Annotators: ssplit, tokenize, parse:
p <- StanfordCoreNLP_Pipeline("parse")
a <- p(s)
a

## Respective formatted parse trees using Penn Treebank notation
## (see <https://catalog.ldc.upenn.edu/docs/LDC95T7/cl93.html>):
ptexts <- sapply(subset(a, type == "sentence")$features, `[[`, "parse")
ptexts

## Read into NLP Tree objects.
ptrees <- lapply(ptexts, Tree_parse)
ptrees

## Basic dependencies:
depends <- lapply(subset(a, type == "sentence")$features, `[[`,
                  "basic-dependencies")
depends
## Note that the non-zero ids (gid for governor and did for dependent)
## refer to word token positions within the respective sentences, and
## not the ids of these token in the annotation: these can easily be
## matched using the sentence constituents features:
lapply(subset(a, type == "sentence")$features, `[[`, "constituents")

## (Similarly for sentence ids used in dcoref document features.)

## Note also that the dependencies are returned as a data frame 
## inheriting from class "Stanford_typed_dependencies" which has print
## and format methods for obtaining the usual formatting.
depends[[1L]]
## Use as.data.frame() to obtain strip this class:
as.data.frame(depends[[1L]])
}
# }

Run the code above in your browser using DataLab