Learn R Programming

textpress (version 1.1.0)

nlp_split_paragraphs: Split Text into Paragraphs

Description

Splits text from the 'text' column of a data frame into individual paragraphs, based on a specified paragraph delimiter.

Usage

nlp_split_paragraphs(corpus, paragraph_delim = "\\n+")

Value

A data.table with columns: `doc_id`, `paragraph_id`, and `text`. Each row represents a paragraph, along with its associated document and paragraph identifiers.

Arguments

corpus

A data frame or data.table containing a text column and identifier column(s) (e.g. doc_id).

paragraph_delim

A regular expression pattern used to split text into paragraphs.

Examples

Run this code
corpus <- data.frame(doc_id = c('1', '2'),
                     text = c("Hello world.\n\nMind your business!",
                              "This is an example.n\nThis is a party!"))
paragraphs <- nlp_split_paragraphs(corpus)


Run the code above in your browser using DataLab