Learn R Programming

textpress (version 1.1.0)

nlp_split_sentences: Split Text into Sentences

Description

This function splits text from a data frame into individual sentences based on specified columns and handles abbreviations effectively.

Usage

nlp_split_sentences(
  corpus,
  by = c("doc_id"),
  abbreviations = textpress::abbreviations
)

Value

A data.table with columns from by, plus sentence_id, text, start, end.

Arguments

corpus

A data frame or data.table containing a text column and the identifiers specified in by.

by

A character vector of column names used as unique identifiers. The last column determines the search unit (e.g., if by = c("doc_id", "para_id"), the search returns matches at the paragraph level).

abbreviations

A character vector of abbreviations to handle during sentence splitting, defaults to textpress::abbreviations.

Examples

Run this code
corpus <- data.frame(doc_id = c('1'),
                    text = c("Hello world. This is an example. No, this is a party!"))
sentences <- nlp_split_sentences(corpus)


Run the code above in your browser using DataLab