Learn R Programming

rainette (version 0.1.2)

split_segments: Split a character string or corpus into segments

Description

Split a character string or corpus into segments, taking into account punctuation where possible

Usage

split_segments(
  obj,
  segment_size = 40,
  segment_size_window = NULL,
  force_single_core = FALSE
)

# S3 method for character split_segments( obj, segment_size = 40, segment_size_window = NULL, force_single_core = FALSE )

# S3 method for Corpus split_segments( obj, segment_size = 40, segment_size_window = NULL, force_single_core = FALSE )

# S3 method for corpus split_segments( obj, segment_size = 40, segment_size_window = NULL, force_single_core = FALSE )

Arguments

obj

character string, quanteda or tm corpus object

segment_size

segment size (in words)

segment_size_window

window around segment size to look for best splitting point

force_single_core

don't use multithreading even on large corpus

Value

If obj is a tm or quanteda corpus object, the result is a quanteda corpus.

Details

By default, if the corpus is large (> 10 000 000 chars), multithreading is used for segments splitting.

Examples

Run this code
# NOT RUN {
require(quanteda)
split_segments(data_corpus_inaugural)
# }

Run the code above in your browser using DataLab