Compare two document term matrices
dtm_compare(
dtm.x,
dtm.y = NULL,
smooth = 0.1,
min_ratio = NULL,
min_chi2 = NULL,
select_rows = NULL,
yates_cor = c("auto", "yes", "no"),
x_is_subset = F,
what = c("freq", "docfreq", "cooccurrence")
)
A data frame with rows corresponding to the terms in dtm and the statistics in the columns
the main document-term matrix
the 'reference' document-term matrix
Laplace smoothing is used for the calculation of the probabilities. Here you can set the added (pseuocount) value.
threshold for the ratio value, which is the ratio of the relative frequency of a term in dtm.x and dtm.y
threshold for the chi^2 value
Alternative to using dtm.y. Has to be a vector with rownames, by which
mode for using yates correctsion in the chi^2 calculation. Can be turned on ("yes") or off ("no"), or set to "auto", in which case cochrans rule is used to determine whether yates' correction is used.
Specify whether dtm.x is a subset of dtm.y. In this case, the term frequencies of dtm.x will be subtracted from the term frequencies in dtm.y
choose whether to compare the frequency ("freq") of terms, or the document frequency ("docfreq"). This also affects how chi^2 is calculated, comparing either freq relative to vocabulary size or docfreq relative to corpus size (N)