cleanNLP (version 1.10.0)

get_document: Access document meta data from an annotation object

Description

Access document meta data from an annotation object

Usage

get_document(annotation)

Arguments

annotation

an annotation object

Value

Returns an object of class c("tbl_df", "tbl", "data.frame") containing one row for every document in the corpus.

The returned data frame includes at least the following columns:

  • "id" - integer. Id of the source document.

  • "time" - date time. The time at which the parser was run on the text.

  • "version" - character. Version of the CoreNLP library used to parse the text.

  • "language" - character. Language of the text, in ISO 639-1 format.

  • "uri" - character. Description of the raw text location. Set to NA if parsed from in-memory character vector.

Other application specific columns may be included as additional variables.

References

Manning, Christopher D., Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55-60.

Examples

Run this code
# NOT RUN {
data(obama)

get_document(obama)


# }

Run the code above in your browser using DataCamp Workspace