stm_tidiers: Tidiers for Structural Topic Models from the stm package

Description

Tidy topic models fit by the stm package. The arguments and return values are similar to lda_tidiers.

Usage

# S3 method for STM
tidy(x, matrix = c("beta", "gamma", "theta"), log = FALSE,
  document_names = NULL, ...)
# S3 method for STM
augment(x, data, ...)
# S3 method for STM
glance(x, ...)

Arguments

An STM fitted model object, created by stm.

matrix

Whether to tidy the beta (per-term-per-topic, default) or gamma/theta (per-document-per-topic) matrix. The stm package calls this the theta matrix, but other topic modeling packages call this gamma.

log

Whether beta/gamma/theta should be on a log scale, default FALSE

document_names

Optional vector of document names for use with per-document-per-topic tidying

...

Extra arguments, not used

data

For augment, the data given to the stm function, either as a dfm or as a tidied table with "document" and "term" columns

Value

tidy returns a tidied version of either the beta or gamma matrix.

augment must be provided a data argument, either a dfm or a table containing one row per original document-term pair, such as is returned by tdm_tidiers, containing columns document and term. It returns that same data as a table with an additional column .topic with the topic assignment for that document-term combination.

glance always returns a one-row table, with columns

k: Number of topics in the model
docs: Number of documents in the model
terms: Number of terms in the model
iter: Number of iterations used
alpha: If an LDA model, the parameter of the Dirichlet distribution for topics over documents

Examples

Run this code

# NOT RUN {
# }
# NOT RUN {
if (requireNamespace("stm", quietly = TRUE) && requireNamespace("quanteda", quietly = TRUE)) {
  library(dplyr)
  library(ggplot2)
  library(stm)
  library(quanteda)

  inaug <- dfm(data_corpus_inaugural, remove = stopwords("english"), remove_punct = TRUE)
  topic_model <- stm(inaug, K = 3, verbose = FALSE, init.type = "Spectral")

  # tidy the word-topic combinations
  td_beta <- tidy(topic_model)
  td_beta

  # Examine the three topics
  td_beta %>%
    group_by(topic) %>%
    top_n(10, beta) %>%
    ungroup() %>%
    ggplot(aes(term, beta)) +
    geom_col() +
    facet_wrap(~ topic, scales = "free") +
    coord_flip()

  # tidy the document-topic combinations, with optional document names
  td_gamma <- tidy(topic_model, matrix = "gamma",
                   document_names = rownames(inaug))
  td_gamma

  # find the assignments of each word in each document
  assignments <- augment(topic_model, inaug)
  assignments
}
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples