Learn R Programming

tok (version 0.2.0)

pre_tokenizer_whitespace: This pre-tokenizer simply splits using the following regex: \w+|[^\w\s]+

Description

This pre-tokenizer simply splits using the following regex: \w+|[^\w\s]+

This pre-tokenizer simply splits using the following regex: \w+|[^\w\s]+

Arguments

Super class

tok::tok_pre_tokenizer -> tok_pre_tokenizer_whitespace

Methods


Method new()

Initializes the whistespace tokenizer

Usage

pre_tokenizer_whitespace$new()


Method clone()

The objects of this class are cloneable with this method.

Usage

pre_tokenizer_whitespace$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

See Also

Other pre_tokenizer: pre_tokenizer, pre_tokenizer_byte_level