Learn R Programming

SportMiner (version 0.1.0)

sm_select_optimal_k: Select Optimal Number of Topics

Description

Tests multiple values of k (number of topics) and calculates topic coherence for each. Returns the optimal k based on maximum coherence score, along with a comparison plot.

Usage

sm_select_optimal_k(
  dtm,
  k_range = seq(2, 20, by = 2),
  method = "gibbs",
  seed = 1729,
  iter = 500,
  burnin = 100,
  plot = TRUE
)

Value

A list containing:

optimal_k

The k value with the highest coherence score

results

Data frame with k and coherence for each tested value

plot

A ggplot object showing coherence vs k

Arguments

dtm

A DocumentTermMatrix object.

k_range

Vector of k values to test. Default is seq(2, 20, by = 2).

method

Topic modeling method. Options: "gibbs" or "vem". Default is "gibbs".

seed

Random seed for reproducibility. Default is 1729.

iter

Number of Gibbs iterations (if method = "gibbs"). Default is 500.

burnin

Number of burn-in iterations (if method = "gibbs"). Default is 100.

plot

Logical indicating whether to display the coherence plot. Default is TRUE.

Examples

Run this code
if (FALSE) {
# Requires document-term matrix from sm_create_dtm()
dtm <- sm_create_dtm(processed_data)
k_selection <- sm_select_optimal_k(dtm, k_range = c(5, 10, 15, 20))
print(k_selection$optimal_k)
}

Run the code above in your browser using DataLab