textshape (version 1.6.0)

golden_rules: Sentence Boundary Disambiguation Edge Cases

Description

A slightly filtered dataset containing Dias's sentence boundary disambiguation edge cases. This is a nested data set with the outcome column as a nested list of desired splits. The non-ASCII cases and spaced ellipsis examples have been removed.

Usage

data(golden_rules)

Arguments

Format

A data frame with 45 rows and 3 variables

Details

  • Rule. The name of the rule to test

  • Text. The testing text

  • Outcome. The desired outcome of the sentence disambiguation

References

Dias, Kevin S. 2015. Golden Rules (English). Retrieved: https://s3.amazonaws.com/tm-town-nlp-resources/golden_rules.txt