RollingLDA: RollingLDA

Description

Performs a rolling version of Latent Dirichlet Allocation.

Usage

RollingLDA(...)
# S3 method for default
RollingLDA(
  texts,
  dates,
  chunks,
  memory,
  vocab.abs = 5L,
  vocab.rel = 0,
  vocab.fallback = 100L,
  doc.abs = 0L,
  memory.fallback = 0L,
  init,
  type = c("ldaprototype", "lda"),
  id,
  ...
)

Value

[named list] with entries

id: [character(1)] See above.
lda: LDA object of the fitted RollingLDA.
docs: [named list] with modeled texts in a preprocessed format. See LDAprep.
dates: [named Date] with dates of the modeled texts.
vocab: [character] with the vocabularies considered for modeling.
chunks: [data.table] with specifications for each model chunk.
param: [named list] with parameter specifications for vocab.abs [integer(1)], vocab.rel [0,1], vocab.fallback [integer(1)] and doc.abs [integer(1)]. See above for explanation.

Arguments

...: additional arguments passed to LDARep or LDAPrototype, respectively. Default parameters are alpha = eta = 1/K and num.iterations = 200. There is no default for K.
texts: [named list]
Tokenized texts.
dates: [(un)named Date]
Dates of the tokenized texts. If unnamed, it must match the order of texts.
chunks: [Date or character(1)]
Sorted dates of the beginnings of each chunk to be modeled after the initial model. If passed as character, dates are determined by passing init plus one day as from argument, max(dates) as to argument and chunks as by argument in seq.Date.
memory: [Date, character(1) or integer(1)]
Sorted dates of the beginnings of each chunk's memory. If passed as character, dates are determined by using the dates of the beginnings of each chunk and substracting the given time interval in memory passing it as by argument in seq.Date. If passed as integer/numeric, the dates are determined by going backwards the modeled texts chronologically and taking the date of the text at position memory.
vocab.abs: [integer(1)]
An absolute lower bound limit for which words are taken into account. All words are considered in the vocabularies that have a count higher than vocab.abs over all texts and at the same time a higher relative frequency than vocab.rel. Default is 5.
vocab.rel: [0,1]
A relative lower bound limit for which words are taken into account. See also vocab.abs. Default is 0.
vocab.fallback: [integer(1)]
An absolute lower bound limit for which words are taken into account. All words are considered in the vocabularies that have a count higher than vocab.fallback over all texts even if they might not have a higher relative frequency than vocab.rel. Default is 100.
doc.abs: [integer(1)]
An absolute lower bound limit for which texts are taken into account. All texts are considered for modeling that have more words (subsetted to words occurring in the vocabularies) than doc.abs. Default is 0.
memory.fallback: [integer(1)]
If there are no texts as memory in a certain chunk, memory is determined by going backwards the modeled texts chronologically and taking the date of the text at position memory.fallback. Default is 0, which means "end the fitting".
init: [Date(1) or integer(1)]
Date up to which the initial model should be computed. This parameter is needed/used only if chunks is passed as character. Otherwise the initial model is computed up to the first date in chunks minus one day. If init is passed as integer/numeric, the init lowest date from dates is selected.
type: [character(1)]
One of "ldaPrototype" or "lda" specifying whether a LDAProtoype or standard LDA should be modeled as initial model. Default is "ldaprototype".
id: [character(1)]
Name for the computation/model.

Details

The function first computes a initial LDA model (using LDARep or LDAPrototype). Afterwards it models temporal chunks of texts with a specified memory for initialization of each model chunk.

The function returns a RollingLDA object. You can receive results and all other elements of this object with getter functions (see getChunks).

Examples

Run this code

roll_lda = RollingLDA(texts = economy_texts,
                      dates = economy_dates,
                      chunks = "quarter",
                      memory = "3 quarter",
                      init = "2008-07-03",
                      K = 10,
                      type = "lda")

roll_lda
getChunks(roll_lda)
getLDA(roll_lda)

# \donttest{
roll_proto = RollingLDA(texts = economy_texts,
                        dates = economy_dates,
                        chunks = "quarter",
                        memory = "3 quarter",
                        init = "2007-07-03",
                        K = 10,
                        n = 12,
                        pm.backend = "socket",
                        ncpus = 2)

roll_proto
getChunks(roll_proto)
getLDA(roll_proto)
# }