This dialog allows computing and plotting the absolute number or row/column percent
of occurrences of terms over a time variable, or of one term by levels of a variable.
The format used by the chosen time variable has to be specified so that it is handled
correctly. The format codes allowed are those recognized by strptime
(see ?strptime
), in particular: [object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Time units are chosen automatically according to the values of the time variable:
it is set to the smallest unit in which all time values can be uniquely expressed.
For example, if free dates are entered, the unit will be days; if times are entered but minutes
are always 0, hours will be used; finally, if times are fully specified, seconds will be used as
the time unit. The chosen unit appears in the vertical axis label of the plot.
Three measures of term occurrences are provided (when no variable is selected, category
below corresponds to the whole corpus):
- Row percent corresponds to the part of chosen term's occurrences over all terms
found in a given category (i.e., the sum of word counts of all documents from the category
after processing) at each time point. This conceptually corresponds to line percents,
except that only the columns of the document-term matrix that match the given terms are shown.
- Column percent corresponds to the part of the chosen term's occurrences that
appear in each of the documents from a given category at each time point. This measure
corresponds to the strict definition of column percents.
- Absolute counts returns the relevant part of the document-term matrix, but summed
after grouping documents according to their category.