This function calculates the word counts for each document in the annotated dataset, excluding stopwords. It unites the annotated dataset, applies stopwords filtering, and counts the occurrences of each word.
obtain_word_counts(data_annotated, data_stopwords, stopwords_to_append)A tibble containing the word counts for each unique combination of mID (document, submissionid, sentenceid) and word. The tibble has columns:
mID: A unique identifier combining document, submissionid, and sentenceid.
word: The word itself.
n: The frequency of the word.
A data frame containing the annotated text data. The data must include columns like document, submissionid, sentenceid, and lemma.
A vector of stopwords that will be excluded from the word count.
A vector of additional stopwords to be appended to data_stopwords before counting words.
This function works by first uniting the document, submissionid, and sentenceid columns into a new identifier mID. Then, it filters out stopwords from the word count, counts the frequency of words, and returns the result.