factorCheck: Control for topics loading exclusively on a single covariate level
Description
Controls for the presence of aberrations in a STM model with a discrete covariate. Computes topical proportions by covariate level, and outputs a warning if any value is above a user-specified threshold, signifying an abnormal concentration of a topic's mass on a single covariate level.
A vector containing the relevant covariate levels.
tolerance
The minimum topical proportion (by document) to consider a document as containing a particular topic.
reporting
The minimum threshold for topical proportion (by covariate level) to report an abberration.
Value
The function returns a list of two items:
topical.proportionsA matrix containing topical proportions by covariate levels. Rows correspond to levels, columns to topics.
total.errorsThe number of matrix entries above the given reporting threshold.
Details
This function computes topical proportion by covariate level using the following algorithm. First, the theta matrix in the STM object (topical proportions by document) is subsetted by covariate levels. Second, indicators are computed specifying whether the entries in the subsetted matrix are higher than the tolerance value. Third, the indicators are aggregated by topic, summing over the columns of the subsetted matrix. Fourth, the aggregate value is divided by the total number of observations above the tolerance value in the corresponding column of the non-subsetted theta matrix, to obtain the topical proportion by covariate level.