init_coreNLP

a character vector describing the desired language;
should be one of: "ar", "de", "en", "es", "fr", or "zh".

language

integer code. Sets which annotators should be loaded, based on
on how long they take to load and run. Speed 0 is the fastest,
and speed 8 is the slowest. See Details for a full description
of the levels

speed

a string giving the location of the CoreNLP java
files. This should point to a directory which
contains, for example the file "stanford-corenlp-*.jar",
where "*" is the version number. If missing, the function
will try to find the library in the environment variable
CORENLP_HOME, and otherwise will fail. (Java model only)

lib_location

a string giving the amount of memory to be assigned to the rJava
engine. For example, "6g" assigned 6 gigabytes of memory. At least
2 gigabytes are recommended at a minimum for running the CoreNLP
package. On a 32bit machine, where this is not possible, setting
"1800m" may also work. This option will only have an effect the first
time <code>init_backend</code> is called for the coreNLP backend, and also
will not have an effect if the java engine is already started by
another process.

boolean. Should messages from the pipeline be written to the console or
suppressed?

verbose

This function must be run before annotating text with
the tokenizers backend. It sets the properties for the
soreNLP engine and loads the file using rJava
interface provided by reticulate. See Details for more
information about the speed codes.

Provides a set of fast tools for converting a textual corpus into a set of normalized tables. Users may make use of a Python back end with 'spaCy' (<https://spacy.io>) or the Java back end 'CoreNLP' (http://stanfordnlp.github.io/CoreNLP/). A minimal back end with no external dependencies is also provided. Exposed annotation tasks include tokenization, part of speech tagging, named entity recognition, entity linking, sentiment analysis, dependency parsing, coreference resolution, and word embeddings. Summary statistics regarding token unigram, part of speech tag, and dependency type frequencies are also included to assist with analyses.

init_coreNLP: Interface for initializingthe coreNLP backend

Description

Usage

Arguments

Details

Examples