Learn R Programming

textpress (version 1.0.0)

nlp_build_chunks: Build Chunks for NLP Analysis

Description

This function processes a data frame for NLP analysis by dividing text into chunks and providing context. It generates chunks of text with a specified size and includes context based on the specified context size.

Usage

nlp_build_chunks(tif, text_hierarchy, chunk_size, context_size)

Value

A data.table with the chunked text and their respective contexts.

Arguments

tif

A data.table containing the text to be chunked.

text_hierarchy

A character vector specifying the columns used for grouping and chunking.

chunk_size

An integer specifying the size of each chunk.

context_size

An integer specifying the size of the context around each chunk.

Examples

Run this code
# Creating a data frame
tif <- data.frame(doc_id = c('1', '1', '2'),
                 sentence_id = c('1', '2', '1'),
                 text = c("Hello world.",
                          "This is an example.",
                          "This is a party!"))

chunks <- nlp_build_chunks(tif,
                           chunk_size = 2,
                           context_size = 1,
                           text_hierarchy = c('doc_id', 'sentence_id'))

Run the code above in your browser using DataLab