Learn R Programming

SportMiner (version 0.1.0)

sm_compare_models: Compare Multiple Topic Models

Description

Trains and compares three topic modeling approaches: LDA (Latent Dirichlet Allocation), STM (Structural Topic Model), and CTM (Correlated Topic Model). Calculates semantic coherence and exclusivity metrics for each model and suggests the optimal model.

Usage

sm_compare_models(
  dtm,
  k = 10,
  metadata = NULL,
  prevalence = NULL,
  seed = 1729,
  lda_method = "gibbs",
  verbose = TRUE
)

Value

A list containing:

models

List of fitted models (lda, stm, ctm)

metrics

Data frame comparing coherence and exclusivity

recommendation

Character string naming the optimal model

Arguments

dtm

A DocumentTermMatrix object.

k

Number of topics to extract. Default is 10.

metadata

Optional data frame with document-level covariates for STM. Must have the same number of rows as dtm. Default is NULL.

prevalence

Optional formula for STM prevalence specification. Default is NULL.

seed

Random seed for reproducibility. Default is 1729.

lda_method

Method for LDA. Options: "gibbs" or "vem". Default is "gibbs".

verbose

Logical indicating whether to print progress messages. Default is TRUE.

Examples

Run this code
if (FALSE) {
# Requires document-term matrix from sm_create_dtm()
dtm <- sm_create_dtm(processed_data)
comparison <- sm_compare_models(dtm, k = 10)
print(comparison$metrics)
print(comparison$recommendation)
}

Run the code above in your browser using DataLab