fit_lda_c: Main C++ Gibbs sampler for Latent Dirichlet Allocation

Description

This is the C++ Gibbs sampler for LDA. "Abandon all hope, ye who enter here."

Usage

fit_lda_c(
  Docs,
  Zd_in,
  Cd_in,
  Cv_in,
  Ck_in,
  alpha_in,
  eta_in,
  iterations,
  burnin,
  optimize_alpha,
  calc_likelihood,
  Beta_in,
  freeze_topics,
  threads = 1L,
  verbose = TRUE
)

Value

Returns a list with the following entries.

Cd is a matrix counting the number of times each topic is sampled per document.

Cv is a matrix counting the number of times each topic is sampled per token.

Cd_mean the same as Cd but values averaged across iterations greater than burnin iterations.

Cv_mean the same as Cv but values averaged across iterations greater than burnin iterations.

Cd_sum the same as Cd but values summed across iterations greater than burnin iterations.

Cv_sum the same as Cv but values summed across iterations greater than burnin iterations.

log_likelihood a matrix with one row indexing iterations and one row of the log likelihood for each iteration.

alpha a vector of the document-topic prior

_eta a matrix of the topic-token prior

Arguments

Docs: List with one element for each document and one entry for each token as formatted by initialize_topic_counts
Zd_in: List with one element for each document and one entry for each token as formatted by initialize_topic_counts
Cd_in: IntegerMatrix denoting counts of topics in documents
Cv_in: IntegerMatrix denoting counts of tokens in topics
Ck_in: IntegerVector denoting counts of topics across all tokens
alpha_in: NumericVector prior for topics over documents
eta_in: NumericMatrix for prior of tokens over topics
iterations: int number of gibbs iterations to run in total
burnin: int number of burn in iterations
optimize_alpha: bool do you want to optimize alpha each iteration?
calc_likelihood: bool do you want to calculate the log likelihood each iteration?
Beta_in: NumericMatrix denoting probability of tokens in topics
freeze_topics: bool if making predictions, set to TRUE
threads: unsigned integer, how many parallel threads? For now, nothing is actually parallel
verbose: bool do you want to print out a progress bar?

Details

Arguments ending in _in are copied and their copies modified in some way by this function. In the case of eta_in and Beta_in, the only modification is that they are converted from matrices to nested std::vector for speed, reliability, and thread safety. In the case of all others, they may be explicitly modified during training.