tokenize_text_by_language: Language-aware tokenizer used across embedders and keyword search
Description
Language-aware tokenizer used across embedders and keyword search
Usage
tokenize_text_by_language(text, language = "en", remove_stopwords = FALSE)
Value
Character vector of tokens
Arguments
- text
Input text
- language
"en" or "ml"
- remove_stopwords
Remove English stopwords when language is "en"