bind_tf_idf
From tidytext v0.1.3
by Julia Silge
Bind the term frequency and inverse document frequency of a tidy text dataset to the dataset
Calculate and bind the term frequency and inverse document frequency of a tidy text dataset, along with the product, tf-idf to the dataset. Each of these values are added as columns.
Usage
bind_tf_idf(tbl, term_col, document_col, n_col)bind_tf_idf_(tbl, term_col, document_col, n_col)
Arguments
- tbl
A tidy text dataset with one-row-per-term-per-document
- term_col
Column containing terms
- document_col
Column containing document IDs
- n_col
Column containing document-term counts
Details
tf_idf
is given bare names, while tf_idf_
is given strings and is therefore suitable for programming with.
If the dataset is grouped, the groups are ignored but are retained.
The dataset must have exactly one row per document-term combination for this to work.
Examples
# NOT RUN {
library(dplyr)
library(janeaustenr)
book_words <- austen_books() %>%
unnest_tokens(word, text) %>%
count(book, word, sort = TRUE) %>%
ungroup()
book_words
# find the words most distinctive to each document
book_words %>%
bind_tf_idf(word, book, n) %>%
arrange(desc(tf_idf))
# }
Community examples
Looks like there are no examples yet.