bind_tf_idf

0th

Percentile

Bind the term frequency and inverse document frequency of a tidy text dataset to the dataset

Calculate and bind the term frequency and inverse document frequency of a tidy text dataset, along with the product, tf-idf to the dataset. Each of these values are added as columns.

Usage
bind_tf_idf(tbl, term_col, document_col, n_col)

bind_tf_idf_(tbl, term_col, document_col, n_col)

Arguments
tbl

A tidy text dataset with one-row-per-term-per-document

term_col

Column containing terms

document_col

Column containing document IDs

n_col

Column containing document-term counts

Details

tf_idf is given bare names, while tf_idf_ is given strings and is therefore suitable for programming with.

If the dataset is grouped, the groups are ignored but are retained.

The dataset must have exactly one row per document-term combination for this to work.

Aliases
  • bind_tf_idf
  • bind_tf_idf_
Examples
# NOT RUN {
library(dplyr)
library(janeaustenr)

book_words <- austen_books() %>%
  unnest_tokens(word, text) %>%
  count(book, word, sort = TRUE) %>%
  ungroup()

book_words

# find the words most distinctive to each document
book_words %>%
  bind_tf_idf(word, book, n) %>%
  arrange(desc(tf_idf))

# }
Documentation reproduced from package tidytext, version 0.1.3, License: MIT + file LICENSE

Community examples

Looks like there are no examples yet.