sentences(x)
parent
, index
, and
text
, and one row for each sentence. The parent
value
is the integer index of the parent text in x
; the index
value
is the integer index of the sentence in its parent; the
text
value is the text of the sentence, a value of type
text
.sentences
splits text at the sentence boundaries defined by
http://unicode.org/reports/tr29/#Sentence_Boundaries.
These boundaries handle Unicode correctly and they give reasonable
behavior across a variety of languages. Unfortunately, the UAX 29
sentence-breaking rules do not handle abbreviations correctly. So, for
example, the text "I saw Mr. Jones today."
will get split into
two sentences.Future versions of the sentences
function may change to
accommodate special rules for abbreviations like "Mr.", "Dr.", etc.
tokens
. sentences("I saw Mr. Jones today.")
sentences(c("What. Are. You. Doing????",
"She asked 'do you really mean that?' and I said 'yes.'"))
Run the code above in your browser using DataLab