Learn R Programming

sumup (version 1.0.2)

obtain_word_counts: Obtain Word Counts

Description

This function calculates the word counts for each document in the annotated dataset, excluding stopwords. It unites the annotated dataset, applies stopwords filtering, and counts the occurrences of each word.

Usage

obtain_word_counts(data_annotated, data_stopwords, stopwords_to_append)

Value

A tibble containing the word counts for each unique combination of mID (document, submissionid, sentenceid) and word. The tibble has columns:

  • mID: A unique identifier combining document, submissionid, and sentenceid.

  • word: The word itself.

  • n: The frequency of the word.

Arguments

data_annotated

A data frame containing the annotated text data. The data must include columns like document, submissionid, sentenceid, and lemma.

data_stopwords

A vector of stopwords that will be excluded from the word count.

stopwords_to_append

A vector of additional stopwords to be appended to data_stopwords before counting words.

Details

This function works by first uniting the document, submissionid, and sentenceid columns into a new identifier mID. Then, it filters out stopwords from the word count, counts the frequency of words, and returns the result.