powered by
Text analyzer for search indexing
Provides text processing pipelines:
Tokenization
Lowercasing
Stopword removal
Stemming
Synonym expansion
lowercase
Convert to lowercase
remove_stopwords
Remove stopwords
stopwords
Set of stopwords
stemmer
Stemmer object
synonyms
Synonym dictionary
min_token_length
Minimum token length
max_token_length
Maximum token length
token_pattern
Regex pattern for tokens
TextAnalyzer$new()
TextAnalyzer$analyze()
TextAnalyzer$analyze_query()
TextAnalyzer$clone()
new()
Create a new TextAnalyzer
TextAnalyzer$new( lowercase = TRUE, remove_stopwords = FALSE, stopwords = NULL, use_stemmer = FALSE, synonyms = NULL, min_token_length = 1, max_token_length = 100, token_pattern = "[a-zA-Z0-9]+" )
Lowercase text (default: TRUE)
Remove stopwords (default: FALSE)
Custom stopwords (default: ENGLISH_STOPWORDS)
use_stemmer
Use stemming (default: FALSE)
Named list of synonyms
Min length (default: 1)
Max length (default: 100)
Regex pattern
analyze()
Analyze text and return tokens
TextAnalyzer$analyze(text)
text
Input text
Character vector of tokens
analyze_query()
Analyze a query string
TextAnalyzer$analyze_query(query)
query
Query text
clone()
The objects of this class are cloneable with this method.
TextAnalyzer$clone(deep = FALSE)
deep
Whether to make a deep clone.
if (FALSE) { analyzer <- TextAnalyzer$english() tokens <- analyzer$analyze("The quick brown foxes are jumping") # c("quick", "brown", "fox", "jump") }
Run the code above in your browser using DataLab