get_coreference: Access coreferences from an annotation object

Description

Coreferences are collections of expressions that all represent the same person, entity, or thing. For example, the text "Lauren loves dogs. She would walk them all day.", there is a coreference consisting of the token "Lauren" in the first sentence and the token "She" in the second sentence. In the output given from this function, a row is given for any mention of an entity; these can be linked using the rid key.

Usage

get_coreference(annotation)

Arguments

annotation

an annotation object

Value

Returns an object of class c("tbl_df", "tbl", "data.frame") containing one row for every coreference in the corpus.

The returned data frame includes at least the following columns:

"id" - integer. Id of the source document.
"rid" - integer. Relation ID.
"mid" - integer. Mention ID; unique to each coreference within a document.
"mention" - character. The mention as raw words from the text.
"mention_type" - character. One of "LIST", "NOMINAL", "PRONOMINAL", or "PROPER".
"number" - character. One of "PLURAL", "SINGULAR", or "UNKNOWN".
"gender" - character. One of "FEMALE", "MALE", "NEUTRAL", or "UNKNOWN".
"animacy" - character. One of "ANIMATE", "INANIMATE", or "UNKNOWN".
"sid" - integer. Sentence id of the coreference.
"tid" - integer. Token id at the start of the coreference.
"tid_end" - integer. Token id at the start of the coreference.
"tid_head" - integer. Token id of the head of the coreference.

References

Manning, Christopher D., Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55-60.

Marta Recasens, Marie-Catherine de Marneffe, and Christopher Potts. The Life and Death of Discourse Entities: Identifying Singleton Mentions. In: Proceedings of NAACL 2013.

Heeyoung Lee, Angel Chang, Yves Peirsman, Nathanael Chambers, Mihai Surdeanu and Dan Jurafsky. Deterministic coreference resolution based on entity-centric, precision-ranked rules. Computational Linguistics 39(4), 2013.

Heeyoung Lee, Yves Peirsman, Angel Chang, Nathanael Chambers, Mihai Surdeanu, Dan Jurafsky. Stanford's Multi-Pass Sieve Coreference Resolution System at the CoNLL-2011 Shared Task. In: Proceedings of the CoNLL-2011 Shared Task, 2011.

Karthik Raghunathan, Heeyoung Lee, Sudarshan Rangarajan, Nathanael Chambers, Mihai Surdeanu, Dan Jurafsky, Christopher Manning A Multi-Pass Sieve for Coreference Resolution. EMNLP-2010, Boston, USA. 2010.

Examples

Run this code

# NOT RUN {
data(obama)

# how often are references made to males versus female in each speech?
coref <- get_coreference(obama)
table(coref$gender, coref$id)

# }

Run the code above in your browser using DataLab