Runs the clean_nlp annotators over a given corpus of text
using either the R, Java, or Python backend. The details for
which annotators to run and how to run them are specified
by using one of: init_tokenizers
,
init_spaCy
, or init_coreNLP
.
run_annotators(input, file = NULL, output_dir = NULL, load = TRUE,
keep = TRUE, as_strings = FALSE, doc_id_offset = 0L, backend = NULL,
meta = NULL)
either a vector of file names to parse, or a character vector with one document in each element. Specify the latter with the as_string flag.
character. Location to store a compressed R object containing the results. If NULL, the default, no such compressed object will be stored.
path to the directory where the raw output
should be stored. Will be created if it does not
exist. Files currently in this location will
be overwritten. If NULL, the default, it uses a
temporary directory.
Not to be confused with file
, this
location stores the raw csv
files rather than a compressed dataset.
logical. Once parsed, should the data be read into R as an annotation object?
logical. Once parsed, should the files be kept
on disk in output_dir
?
logical. Is the data given to input
the
actual document text rather
than file names?
integer. The first document id to use. Defaults to 0.
which backend to use. Will default to the last model to be initalized.
an optional data frame to bind to the document table
if load
is true, an object of class annotation
.
Otherwise, a character vector giving the output location of
the files.
Manning, Christopher D., Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55-60.
# NOT RUN {
annotation <- run_annotators("path/to/corpus/directory")
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab