This function clusters documents using K-means based on their TF-IDF vectors.
cluster_docs(
text_data,
text_column = "abstract",
n_clusters = 5,
min_term_freq = 2,
max_doc_freq = 0.9,
random_seed = 42
)A data frame with the original data and cluster assignments.
A data frame containing text data.
Name of the column containing text to analyze.
Number of clusters to create.
Minimum frequency for a term to be included.
Maximum document frequency (as a proportion) for a term to be included.
Seed for random number generation (for reproducibility).