Learn R Programming

quanteda.tidy

About

quanteda.tidy extends the quanteda package with functionality from the “tidyverse”, especially dplyr.
Note that this is not the same as tidytext, which stretches tokens into data.frames. Instead, tidy functions operate only on document variables, but extends these functions (from dplyr) to work on quanteda objects as if they were tibbles or data.frames.

Installation

You can install the stable version of quanteda.tidy from CRAN:

install.packages("quanteda.tidy")

Or install the development version from GitHub:

pak::pkg_install("quanteda/quanteda.tidy")

Overview of Functions

The functions in quanteda.tidy are organized into four categories, following the dplyr documentation:

CategoryFunctionDescription
Rowsfilter()Subset documents based on docvar conditions
Rowsslice(), slice_head(), slice_tail()Subset documents by position
Rowsslice_sample()Randomly sample documents
Rowsslice_min(), slice_max()Select documents with min/max docvar values
Rowsarrange(), distinct()Reorder documents; keep unique documents
Columnsselect()Keep or drop docvars by name
Columnsrename(), rename_with()Rename docvars
Columnsrelocate()Change docvar column order
Columnsmutate(), transmute()Create or modify docvars
Columnspull()Extract a single docvar as a vector
Columnsglimpse()Get a quick overview of the corpus
Groups of rowsadd_count()Add count by group as a docvar
Groups of rowsadd_tally()Add total count as a docvar
Pairs of data framesleft_join()Join corpus with external data frame

Example

Adding a document variable for full president name:

library("quanteda.tidy", warn.conflicts = FALSE)
## Loading required package: quanteda
## Package version: 4.3.1
## Unicode version: 14.0
## ICU version: 71.1
## Parallel computing: disabled
## See https://quanteda.io for tutorials and examples.

data_corpus_inaugural %>%
  mutate(fullname = paste(FirstName, President, sep = ", ")) %>%
  summary(n = 5)
## Corpus consisting of 60 documents, showing 5 documents:
## 
##             Text Types Tokens Sentences Year  President FirstName
##  1789-Washington   625   1537        23 1789 Washington    George
##  1793-Washington    96    147         4 1793 Washington    George
##       1797-Adams   826   2577        37 1797      Adams      John
##   1801-Jefferson   717   1923        41 1801  Jefferson    Thomas
##   1805-Jefferson   804   2380        45 1805  Jefferson    Thomas
##                  Party           fullname
##                   none George, Washington
##                   none George, Washington
##             Federalist        John, Adams
##  Democratic-Republican  Thomas, Jefferson
##  Democratic-Republican  Thomas, Jefferson

Copy Link

Version

Install

install.packages('quanteda.tidy')

Version

0.4

License

GPL-3

Maintainer

Kenneth Benoit

Last Published

December 17th, 2025

Functions in quanteda.tidy (0.4)

dplyr_vector

Vector functions from dplyr
filter.corpus

Return documents with matching conditions
pull.corpus

Pull out a single document variable
rename.corpus

Rename document variables
slice.corpus

Subset documents using their positions
select.corpus

Subset docvars using their names and types
glimpse

Get a glimpse of a quanteda object
dplyr_context

Context functions from dplyr
arrange.corpus

Arrange the document order of a corpus by variables
add_count.corpus

Add count of observations to corpus
add_tally

Add count of observations to corpus
distinct.corpus

Subset documents distinct/unique by document variables
mutate.corpus

Create or transform document variables
relocate.corpus

Change column order of document variables
left_join.corpus

Join corpus with a data frame
quanteda.tidy-package

quanteda.tidy: Tidyverse Extensions for quanteda
reexports

Objects exported from other packages
dplyr_pairs

Verbs that operate on pairs of data frames
dplyr_cols

Verbs that operate on columns
dplyr-wrappers

Wrappers to dplyr functions
add_count

Verbs that operate on groups of rows
arrange

Verbs that operate on rows