This is the C++ Gibbs sampler for LDA. "Abandon all hope, ye who enter here."
fit_lda_c(
Docs,
Zd_in,
Cd_in,
Cv_in,
Ck_in,
alpha_in,
eta_in,
iterations,
burnin,
optimize_alpha,
calc_likelihood,
Beta_in,
freeze_topics,
threads = 1L,
verbose = TRUE
)Returns a list with the following entries.
Cd is a matrix counting the number of times each topic is sampled per
document.
Cv is a matrix counting the number of times each topic is sampled per token.
Cd_mean the same as Cd but values averaged across iterations
greater than burnin iterations.
Cv_mean the same as Cv but values averaged across iterations
greater than burnin iterations.
Cd_sum the same as Cd but values summed across iterations
greater than burnin iterations.
Cv_sum the same as Cv but values summed across iterations
greater than burnin iterations.
log_likelihood a matrix with one row indexing iterations and one
row of the log likelihood for each iteration.
alpha a vector of the document-topic prior
_eta a matrix of the topic-token prior
List with one element for each document and one entry for each token
as formatted by initialize_topic_counts
List with one element for each document and one entry for each token
as formatted by initialize_topic_counts
IntegerMatrix denoting counts of topics in documents
IntegerMatrix denoting counts of tokens in topics
IntegerVector denoting counts of topics across all tokens
NumericVector prior for topics over documents
NumericMatrix for prior of tokens over topics
int number of gibbs iterations to run in total
int number of burn in iterations
bool do you want to optimize alpha each iteration?
bool do you want to calculate the log likelihood each iteration?
NumericMatrix denoting probability of tokens in topics
bool if making predictions, set to TRUE
unsigned integer, how many parallel threads? For now, nothing is actually parallel
bool do you want to print out a progress bar?
Arguments ending in _in are copied and their copies modified in
some way by this function. In the case of eta_in and Beta_in,
the only modification is that they are converted from matrices to nested
std::vector for speed, reliability, and thread safety. In the case
of all others, they may be explicitly modified during training.