Estimates a joint sentiment topic model using a Gibbs sampler, see Details for model description.
jst(
dfm,
sentiLexInput = NULL,
numSentiLabs = 3,
numTopics = 10,
numIters = 3,
updateParaStep = -1,
alpha = -1,
beta = -1,
gamma = -1,
excludeNeutral = FALSE
)
A quanteda dfm object
Optional: A quanteda dictionary object for semi-supervised learning. If
a dictionary is used, numSentiLabs
will be overridden by the number of categories in the
dictionary object. An extra category will by default be added for neutral words. This can be
turned off by setting excludeNeutral = TRUE
.
Integer, the number of sentiment labels (defaults to 3)
Integer, the number of topics (defaults to 10)
Integer, the number of iterations (defaults to 3 for test runs, optimize by hand)
Integer. The number of iterations between optimizations of hyperparameter alpha
Double, hyperparameter for (defaults to .05 * (average docsize/number of sentitopics))
Double, hyperparameter for (defaults to .01, with multiplier .9/.1 for sentiment dictionary presence)
Double, hyperparameter for (defaults to .05 * (average docsize/number of sentiment categories))
Boolean. If a dictionary is used, an extra category is added for neutral
words. Words in the dictionary receive a low probability of being allocated there. If this is set
to TRUE
, the neutral sentiment category will be omitted. The variable is irrelevant if no
dictionary is used. Defaults to FALSE
.
A JST.result object containing a data.frame for each estimated parameter
Basic model description:
Lin, C. and He, Y., 2009, November. Joint sentiment/topic model for sentiment analysis. In Proceedings of the 18th ACM conference on Information and knowledge management (pp. 375-384). ACM.
Weak supervision adopted from:
Lin, C., He, Y., Everson, R. and Ruger, S., 2012. Weakly supervised joint sentiment-topic detection from text. IEEE Transactions on Knowledge and Data engineering, 24(6), pp.1134-1145.
# NOT RUN {
model <- jst(quanteda::dfm(quanteda::data_corpus_irishbudget2010),
paradigm(),
numTopics = 5,
numIters = 15) # Use more in practice!
# }
Run the code above in your browser using DataLab