tm_clean: Clean subject line text prior to analysis
Description
This function processes the Subject column in a Meeting Query by applying
tokenisation usingtidytext::unnest_tokens(), and removing any stopwords
supplied in a data frame (using the argument stopwords). This is a
sub-function that feeds into tm_freq(), tm_cooc(), and tm_wordcloud().
The default is to return a data frame with tokenised counts of words or
ngrams.
Usage
tm_clean(data, token = "words", stopwords = NULL)
Arguments
data
A Meeting Query dataset in the form of a data frame.
token
A character vector accepting either "words" or "ngrams",
determining type of tokenisation to return.
stopwords
A single-column data frame labelled 'word' containing
custom stopwords to remove.