Learn R Programming

rsyntax (version 0.1.0)

spacy_split_conjunctions: Split conjunctions in spacy tokens

Description

The specific problem of splitting conjunctions is rather complicated because it requires recursion (for conjunctions in conjunctions) and needs to somehow deal with argument drop. In the sentence: "Bob ate bread and cheese", we cannot simply split the sentence into "Bob ate bread" and "cheese". We need to copy the implicit arguments to get "Bob ate bread" and "Bob ate cheese".

Usage

spacy_split_conjunctions(tokens)

Arguments

tokens

a tokenIndex based on texts parsed with spacy_parse (with dependency=TRUE)

Value

the tokenIndex with conjunctions split into separate isolated branches.

Details

Note that this function is mainly provided for demonstration purposes. The goal of the rsyntax package is to provide the tools to query and reshape dependency trees, and (at least for now) we want to keep applications such as this function separated. This specific implementation is also not perfect, and for complex sentences other forms of text simplification would ideally be performed first (e.g., isolating relative clauses).

Examples

Run this code
# NOT RUN {
tokens = tokens_spacy[tokens_spacy$doc_id == 'text5',]

# }
# NOT RUN {
tokens %>%
   spacy_split_conjunctions() %>%
   plot_tree()
 
# }

Run the code above in your browser using DataLab