run_annotators
Run the annotation pipeline on a set of documents
Runs the clean_nlp annotators over a given corpus of text
using either the R, Java, or Python backend. The details for
which annotators to run and how to run them are specified
by using one of: init_tokenizers
,
init_spaCy
, or init_coreNLP
.
Usage
run_annotators(input, file = NULL, output_dir = NULL, load = TRUE,
keep = TRUE, as_strings = FALSE, doc_id_offset = 0L, backend = NULL,
meta = NULL)
Arguments
- input
either a vector of file names to parse, or a character vector with one document in each element. Specify the latter with the as_string flag.
- file
character. Location to store a compressed R object containing the results. If NULL, the default, no such compressed object will be stored.
- output_dir
path to the directory where the raw output should be stored. Will be created if it does not exist. Files currently in this location will be overwritten. If NULL, the default, it uses a temporary directory. Not to be confused with
file
, this location stores the raw csv files rather than a compressed dataset.- load
logical. Once parsed, should the data be read into R as an annotation object?
- keep
logical. Once parsed, should the files be kept on disk in
output_dir
?- as_strings
logical. Is the data given to
input
the actual document text rather than file names?- doc_id_offset
integer. The first document id to use. Defaults to 0.
- backend
which backend to use. Will default to the last model to be initalized.
- meta
an optional data frame to bind to the document table
Value
if load
is true, an object of class annotation
.
Otherwise, a character vector giving the output location of
the files.
References
Manning, Christopher D., Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55-60.
Examples
# NOT RUN {
annotation <- run_annotators("path/to/corpus/directory")
# }
# NOT RUN {
# }