Learn R Programming

VectrixDB (version 1.1.2)

tokenize_text_by_language: Language-aware tokenizer used across embedders and keyword search

Description

Language-aware tokenizer used across embedders and keyword search

Usage

tokenize_text_by_language(text, language = "en", remove_stopwords = FALSE)

Value

Character vector of tokens

Arguments

text

Input text

language

"en" or "ml"

remove_stopwords

Remove English stopwords when language is "en"