Learn R Programming

textmineR (version 2.1.3)

FormatRawLdaOutput: Format Raw Output from lda.collapsed.gibbs.sampler

Description

extracts outputs from LDA model estimated with lda package by Jonathan Chang

Usage

FormatRawLdaOutput(lda_result, docnames, smooth = TRUE,
  softmax = FALSE)

Arguments

lda_result

The list value returned by lda.collapsed.gibbs.sampler

docnames

A character vector giving the names of documents. This is generally rownames(dtm).

smooth

Logical. Do you want to smooth your topic proportions so that there is a positive value for each term in each topic? Defaults to TRUE

softmax

Logical. Do you want to use the softmax function to normalize raw output? If FALSE (the default) output is normalized using standard sum.

Value

Returns a list with two elements: phi whose rows represent the distribution of words across a topic and theta whose rows represent the distribution of topics across a document.

Examples

Run this code
# NOT RUN {
# Load a pre-formatted dtm and topic model
data(nih_sample_dtm) 

# Get a sample of documents
dtm <- nih_sample_dtm[ sample(1:nrow(nih_sample_dtm), 20) , ]

# re-create a character vector of documents from the DTM
lex <- Dtm2Docs(dtm)

# Format for input to lda::lda.collapsed.gibbs.sampler
lex <- lda::lexicalize(lex, vocab=colnames(dtm))

# Fit the model from lda::lda.collapsed.gibbs.sampler
lda <- lda::lda.collapsed.gibbs.sampler(documents = lex, K = 5, 
                                         vocab = colnames(dtm), 
                                         num.iterations=200, 
                                         alpha=0.1, eta=0.05)
                                         
# Format the result to get phi and theta matrices                                        
lda <- FormatRawLdaOutput(lda_result=lda, docnames=rownames(dtm), smooth=TRUE)

# }

Run the code above in your browser using DataLab