TextAnalyzer: Text Analyzer

Description

Text analyzer for search indexing

Provides text processing pipelines:

Tokenization
Lowercasing
Stopword removal
Stemming
Synonym expansion

Arguments

Public fields

lowercase: Convert to lowercase
remove_stopwords: Remove stopwords
stopwords: Set of stopwords
stemmer: Stemmer object
synonyms: Synonym dictionary
min_token_length: Minimum token length
max_token_length: Maximum token length
token_pattern: Regex pattern for tokens

Methods

Public methods

Method `new()`

Create a new TextAnalyzer

Usage

TextAnalyzer$new(
  lowercase = TRUE,
  remove_stopwords = FALSE,
  stopwords = NULL,
  use_stemmer = FALSE,
  synonyms = NULL,
  min_token_length = 1,
  max_token_length = 100,
  token_pattern = "[a-zA-Z0-9]+"
)

Arguments

lowercase: Lowercase text (default: TRUE)

remove_stopwords

Remove stopwords (default: FALSE)

stopwords

Custom stopwords (default: ENGLISH_STOPWORDS)

use_stemmer

Use stemming (default: FALSE)

synonyms

Named list of synonyms

min_token_length

Min length (default: 1)

max_token_length

Max length (default: 100)

token_pattern

Regex pattern

Method `analyze()`

Analyze text and return tokens

Usage

TextAnalyzer$analyze(text)

Arguments

text: Input text

Returns

Character vector of tokens

Method `analyze_query()`

Analyze a query string

Usage

TextAnalyzer$analyze_query(query)

Arguments

query: Query text

Returns

Character vector of tokens

Method `clone()`

The objects of this class are cloneable with this method.

Usage

TextAnalyzer$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Examples

Run this code

if (FALSE) {
analyzer <- TextAnalyzer$english()
tokens <- analyzer$analyze("The quick brown foxes are jumping")
# c("quick", "brown", "fox", "jump")
}

Run the code above in your browser using DataLab

Description

Arguments

Public fields

Methods

Public methods

Method new()

Usage

Arguments

Method analyze()

Usage

Arguments

Returns

Method analyze_query()

Usage

Arguments

Returns

Method clone()

Usage

Arguments

Examples

Method `new()`

Method `analyze()`

Method `analyze_query()`

Method `clone()`