The set of functions LDA()
, JST()
, rJST()
and
sentopicmodel()
are all wrappers to an unified C++ routine and attempt to
replicate their corresponding model. This function is the lower level
wrapper to the C++ routine.
sentopicmodel(
x,
lexicon = NULL,
L1 = 5,
L2 = 3,
L1prior = 1,
L2prior = 5,
beta = 0.01,
L1cycle = 0,
L2cycle = 0,
reversed = TRUE
)
An S3 list containing the model parameter and the estimated mixture.
This object corresponds to a Gibbs sampler estimator with zero iterations.
The MCMC can be iterated using the fit()
function.
tokens
is the tokens object used to create the model
vocabulary
contains the set of words of the corpus
it
tracks the number of Gibbs sampling iterations
za
is the list of topic assignment, aligned to the tokens
object with
padding removed
logLikelihood
returns the measured log-likelihood at each iteration,
with a breakdown of the likelihood into hierarchical components as
attribute
The topWords()
function easily extract the most probables words of each
topic/sentiment.
tokens object containing the texts. A coercion will be attempted if x
is not a tokens.
a quanteda
dictionary with positive and negative categories
the number of labels in the first document mixture layer
the number of labels in the second document mixture layer
the first layer hyperparameter of document mixtures
the second layer hyperparameter of document mixtures
the hyperparameter of vocabulary distribution
integer specifying the cycle size between two updates of the hyperparameter L1prior
integer specifying the cycle size between two updates of the hyperparameter L2prior
indicates on which dimension should lexicon
apply. When
reversed=FALSE
, the lexicon is applied on the first layer of the document
mixture (as in a JST model). When reversed=TRUE
, the lexicon is applied to
the second layer of the document mixture (as in a reversed-JST model).
Olivier Delmarcelle
Fitting a model: fit()
,
extracting top words: topWords()
Other topic models:
JST()
,
LDA()
,
rJST()
LDA(ECB_press_conferences_tokens)
rJST(ECB_press_conferences_tokens, lexicon = LoughranMcDonald)
Run the code above in your browser using DataLab