cleanNLP (version 1.10.0)

run_annotators: Run the annotation pipeline on a set of documents

Description

Runs the clean_nlp annotators over a given corpus of text using either the R, Java, or Python backend. The details for which annotators to run and how to run them are specified by using one of: init_tokenizers, init_spaCy, or init_coreNLP.

Usage

run_annotators(input, file = NULL, output_dir = NULL, load = TRUE,
  keep = TRUE, as_strings = FALSE, doc_id_offset = 0L, backend = NULL,
  meta = NULL)

Arguments

input

either a vector of file names to parse, or a character vector with one document in each element. Specify the latter with the as_string flag.

file

character. Location to store a compressed R object containing the results. If NULL, the default, no such compressed object will be stored.

output_dir

path to the directory where the raw output should be stored. Will be created if it does not exist. Files currently in this location will be overwritten. If NULL, the default, it uses a temporary directory. Not to be confused with file, this location stores the raw csv files rather than a compressed dataset.

load

logical. Once parsed, should the data be read into R as an annotation object?

keep

logical. Once parsed, should the files be kept on disk in output_dir?

as_strings

logical. Is the data given to input the actual document text rather than file names?

doc_id_offset

integer. The first document id to use. Defaults to 0.

backend

which backend to use. Will default to the last model to be initalized.

meta

an optional data frame to bind to the document table

Value

if load is true, an object of class annotation. Otherwise, a character vector giving the output location of the files.

References

Manning, Christopher D., Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55-60.

Examples

Run this code
# NOT RUN {
annotation <- run_annotators("path/to/corpus/directory")
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab