This function preprocesses text data using vectorized operations for better performance.
This function preprocesses text data using vectorized operations for better performance.
vec_preprocess(
text_data,
text_column = "abstract",
remove_stopwords = TRUE,
custom_stopwords = NULL,
min_word_length = 3,
max_word_length = 50,
chunk_size = 100
)vec_preprocess(
text_data,
text_column = "abstract",
remove_stopwords = TRUE,
custom_stopwords = NULL,
min_word_length = 3,
max_word_length = 50,
chunk_size = 100
)
A data frame with processed text.
A data frame with processed text.
A data frame containing text data.
Name of the column containing text to process.
Logical. If TRUE, removes stopwords.
Character vector of additional stopwords to remove.
Minimum word length to keep.
Maximum word length to keep.
Number of documents to process in each chunk.