Calculates different metrics to estimate the most preferable number of topics for LDA model.
FindTopicsNumber(
dtm,
topics = seq(10, 40, by = 10),
metrics = "Griffiths2004",
method = "Gibbs",
control = list(),
mc.cores = NA,
return_models = FALSE,
verbose = FALSE,
libpath = NULL
)
An object of class "DocumentTermMatrix" with term-frequency weighting or an object coercible to a "simple_triplet_matrix" with integer entries.
Vector with number of topics to compare different models.
String or vector of possible metrics: "Griffiths2004", "CaoJuan2009", "Arun2010", "Deveaud2014".
The method to be used for fitting; see LDA.
NA, integer or, cluster; the number of CPU cores to process models simultaneously. If an integer, create a cluster on the local machine. If a cluster, use but don't destroy it (allows multiple-node clusters). Defaults to NA, which triggers auto-detection of number of cores on the local machine.
Whether or not to return the model objects of class "LDA. Defaults to false. Setting to true requires the tibble package.
If false (default), suppress all warnings and additional information.
Path to R packages (use only if your R installation can't find 'topicmodels' package, [issue #3](https://github.com/nikita-moor/ldatuning/issues/3). For example: "C:/Program Files/R/R-2.15.2/library" (Windows), "/home/user/R/x86_64-pc-linux-gnu-library/3.2" (Linux)
Data-frame with one or more metrics. numbers of topics and
corresponding values of metric. Can be directly used by
FindTopicsNumber_plot
to draw a plot.
# NOT RUN {
library(topicmodels)
data("AssociatedPress", package="topicmodels")
dtm <- AssociatedPress[1:10, ]
FindTopicsNumber(dtm, topics = 2:10, metrics = "Arun2010", mc.cores = 1L)
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab