quanteda v2.1.2


Monthly downloads



Quantitative Analysis of Textual Data

A fast, flexible, and comprehensive framework for quantitative text analysis in R. Provides functionality for corpus management, creating and manipulating tokens and ngrams, exploring keywords in context, forming and manipulating sparse matrices of documents by features and feature co-occurrences, analyzing keywords, computing feature similarities and distances, applying content dictionaries, applying supervised and unsupervised machine learning, visually representing text and text analyses, and more.


quanteda: quantitative analysis of textual

Version Downloads Total
Downloads R build
status codecov DOI DOI


An R package for managing and analyzing text, created by Kenneth Benoit. Supported by the European Research Council grant ERC-2011-StG 283794-QUANTESS.

For more details, see https://quanteda.io.

How to Install

The normal way from CRAN, using your R GUI or


Or for the latest development version:

# devtools package required to install quanteda from Github 

Because this compiles some C++ and Fortran source code, you will need to have installed the appropriate compilers.

If you are using a Windows platform, this means you will need also to install the Rtools software available from CRAN.

If you are using macOS, you should install the macOS tools, namely the Clang 6.x compiler and the GNU Fortran compiler (as quanteda requires gfortran to build). If you are still getting errors related to gfortran, follow the fixes here.

How to Use

See the quick start guide to learn how to use quanteda.

How to cite

Benoit, Kenneth, Kohei Watanabe, Haiyan Wang, Paul Nulty, Adam Obeng, Stefan Müller, and Akitaka Matsuo. (2018) “quanteda: An R package for the quantitative analysis of textual data”. Journal of Open Source Software. 3(30), 774. https://doi.org/10.21105/joss.00774.

For a BibTeX entry, use the output from citation(package = "quanteda").

Leaving Feedback

If you like quanteda, please consider leaving feedback or a testimonial here.


Contributions in the form of feedback, comments, code, and bug reports are most welcome. How to contribute:

Functions in quanteda

Name Description
as.fcm Coercion and checking functions for fcm objects
as.dictionary Coercion and checking functions for dictionary objects
View View methods for quanteda
char_tolower Convert the case of character objects
as.corpus coerce a compressed corpus to a standard corpus
as.matrix.dfm Coerce a dfm to a matrix or data.frame
as.igraph Convert an fcm to an igraph object
check_font Check if font is available on the system
as.list.tokens Coercion, checking, and combining functions for tokens objects
compute_mattr Compute the Moving-Average Type-Token Ratio (MATTR)
as.yaml Convert quanteda dictionary objects to the YAML format
corpus_sample Randomly sample documents from a corpus
compute_lexdiv_stats Compute lexical diversity from a dfm or tokens
as.data.frame.dfm Convert a dfm to a data.frame
dfm_replace Replace features in dfm
data_char_sampletext A paragraph of text for testing various text-based functions
convert-wrappers Convenience wrappers for dfm convert
dfm_match Match the feature set of a dfm to given feature names
data_char_ukimmig2010 Immigration-related sections of 2010 UK party manifestos
corpus_segment Segment texts on a pattern match
as.dfm Coercion and checking functions for dfm objects
compute_msttr Compute the Mean Segmental Type-Token Ratio (MSTTR)
data_dictionary_LSD2015 Lexicoder Sentiment Dictionary (2015)
dfm-class Virtual class "dfm" for a document-feature matrix
as.matrix,textstat_simil_sparse-method as.matrix method for textstat_simil_sparse
fcm Create a feature co-occurrence matrix
dfm2lsa Convert a dfm to an lsa "textmatrix"
dfm_compress Recombine a dfm or fcm by combining identical dimension elements
dfm_sort Sort a dfm by frequency of one or more margins
char_select Select or remove elements from a character vector
cbind.dfm Combine dfm objects by Rows or Columns
fcm-class Virtual class "fcm" for a feature co-occurrence matrix
flatten_dictionary Flatten a hierarchical dictionary into a list of character vectors
dfm_split_hyphenated_features Split a dfm's hyphenated features into constituent parts
dictionary2-class dictionary class objects and functions
as.network redefinition of network::as.network()
convert Convert quanteda objects to non-quanteda formats
dfm_group Combine documents in a dfm by a grouping variable
attributes<- Function extending base::attributes()
corpus_subset Extract a subset of a corpus
dfm_lookup Apply a dictionary to a dfm
data_corpus_inaugural US presidential inaugural address texts
data_dfm_lbgexample dfm from data in Table 1 of Laver, Benoit, and Garry (2003)
corpus-class Base method extensions for corpus objects
corpus Construct a corpus object
is_glob Check if patterns contains glob wildcard
format_sparsity format a sparsity value for printing
bootstrap_dfm Bootstrap a dfm
corpus_trim Remove sentences based on their token lengths or a pattern match
dictionary_edit Conveniently edit dictionaries
get_object_version Get the package version that created an object
dictionary Create a dictionary
meta_system Internal function to get, set or initialize system metadata
get_docvars Internal function to extract docvars
meta Get or set object metadata
is_indexed Check if a glob pattern is indexed by index_types
kwic Locate keywords-in-context
dfm_tolower Convert the case of the features of a dfm and combine
corpus_trimsentences Remove sentences based on their token lengths or a pattern match
docfreq Compute the (weighted) document frequency of a feature
list2dictionary Internal function to convert a list to a dictionary
phrase Declare a compound character to be a sequence of separate pattern matches
create Function to assign multiple slots to a S4 object
dfm_trim Trim a dfm using frequency threshold-based feature selection
set_fcm_slots<- Set values to a fcm's S4 slots
set_dfm_slots<- Set values to a dfm's S4 slots
sample_bygroup Sample a vector by a group
%>% Pipe operator
reshape_docvars Internal function to subset or duplicate docvar rows
corpus_reshape Recast the document units of a corpus
metadoc Get or set document-level meta-data
docnames Get or set document names
dfm-internal Internal functions for dfm objects
data-relocated Formerly included data objects
dfm_sample Randomly sample documents or features from a dfm
textplot_network Plot a network of feature co-occurrences
textplot_keyness Plot word keyness
dfm_select Select features from a dfm or fcm
featnames Get the feature labels from a dfm
docvars Get or set document-level variables
dfm_subset Extract a subset of a dfm
data-internal Internal data sets
fcm_sort Sort an fcm in alphabetical order of the features
field_system Shortcut functions to access or assign metadata
dfm Create a document-feature matrix
featfreq Compute the frequencies of features
dfm_tfidf Weight a dfm by tf-idf
groups Grouping variable(s) for various functions
dfm_weight Weight the feature frequencies in a dfm
nsyllable Count syllables in a text
ntoken Count the number of tokens or types
remove_empty_keys Utility function to remove empty keys
textstat_lexdiv Calculate lexical diversity
replace_dictionary_values Internal function to replace dictionary values
textstat_keyness Calculate keyness statistics
tokens_group Recombine documents tokens by groups
names-quanteda Special handling for names of quanteda objects
head.corpus Return the first or last part of a corpus
tokens_lookup Apply a dictionary to a tokens object
diag2na convert same-value pairs to NA in a textstat_proxy object
split_values Internal function for special handling of multi-word dictionary values
is_regex Internal function for select_types() to check if a string is a regular expression
summary.corpus Summarize a corpus
escape_regex Internal function for select_types() to escape regular expressions
texts Get or assign corpus texts
expand Simpler and faster version of expand.grid() in base package
unlist_integer Unlist a list of integer vectors safely
textstat_collocations Identify and score multi-word expressions
textstat_proxy-class textstat_simil/dist classes
keyness Compute keyness (internal functions)
textstat_proxy [Experimental] Compute document/feature proximity
friendly_class_undefined_message Print friendly object class not defined message
matrix2dfm Converts a Matrix to a dfm
matrix2fcm Converts a Matrix to a fcm
pattern2id Convert regex and glob patterns to type IDs or fixed patterns
unused_dots Raise warning of unused dots
generate_groups Generate a grouping vector from docvars
head.dfm Return the first or last part of a dfm
pattern2list Convert various input as pattern to a vector used in tokens_select, tokens_compound and kwic.
head.textstat_proxy Return the first or last part of a textstat_proxy object
tokens_compound Convert token sequences into compound tokens
print-quanteda Print methods for quanteda core objects
tokens_chunk Segment tokens object by chunks of a given size
make_meta Internal functions to create a list for the meta attribute
lowercase_dictionary_values Internal function to lowercase dictionary values
merge_dictionary_values Internal function to merge values of duplicated keys
ndoc Count the number of documents or features
nscrabble Count the Scrabble letter values of text
nsentence Count the number of sentences
types Get word types from a tokens object
nest_dictionary Utility function to generate a nested list
print.phrases Print a phrase object
serialize_tokens Function to serialize list-of-character tokens
set_dfm_dimnames<- Internal functions to set dimnames
message_error Return an error message
quanteda-package An R package for the quantitative analysis of textual data
reexports Objects exported from other packages
pattern Pattern for feature, token and keyword matching
spacyr-methods Extensions for and from spacy_parse objects
read_dict_functions Internal functions to import dictionary files
object-builders Object compilers
quanteda_options Get or set package options for quanteda
wordcloud_comparison Internal function for textplot_wordcloud
unlist_character Unlist a list of character vectors safely
sparsity Compute the sparsity of a document-feature matrix
textplot_wordcloud Plot features as a wordcloud
textplot_xray Plot the dispersion of key word(s)
tokenize_internal quanteda tokenizers
tokens Construct a tokens object
textstat_entropy Compute entropies of documents or features
search_index Internal function for select_types to search the index using fastmatch.
textstat_summary Summarize documents
search_glob Select types without performing slow regex search
textstat_simil Similarity and distance computation between documents or features
textstat_frequency Tabulate feature frequencies
tokens_segment Segment tokens object by patterns
tokens_replace Replace tokens in a tokens object
tokens_select Select or remove tokens from a tokens object
tokens_wordstem Stem the terms in an object
tokens_split Split tokens by a separator pattern
tokens_subset Extract a subset of a tokens
tokens_sample Randomly sample documents from a tokens object
topfeatures Identify the most frequent features in a dfm
summary_metadata Functions to add or retrieve corpus summary metadata
textmodels Models for scaling and classification of textual data
textstat_readability Calculate readability
textstat_select Select rows of textstat objects by glob, regex or fixed patterns
tokens_ngrams Create ngrams and skipgrams from tokens
tokens_tolower Convert the case of tokens
tokens_tortl [Experimental] Change direction of words in tokens
valuetype Pattern matching using valuetype
wordcloud Internal function for textplot_wordcloud
tokens_recompile recompile a serialized tokens object
No Results!

Vignettes of quanteda

No Results!

Last month downloads


License GPL-3
LinkingTo Rcpp, RcppParallel, RcppArmadillo (>= 0.7.600.1.0)
URL https://quanteda.io
Encoding UTF-8
BugReports https://github.com/quanteda/quanteda/issues
LazyData TRUE
VignetteBuilder knitr
Language en-GB
Collate 'RcppExports.R' 'View.R' 'meta.R' 'quanteda-documentation.R' 'aaa.R' 'bootstrap_dfm.R' 'casechange-functions.R' 'char_select.R' 'convert.R' 'corpus-addsummary-metadata.R' 'corpus-methods-base.R' 'corpus-methods-quanteda.R' 'corpus.R' 'corpus_reshape.R' 'corpus_sample.R' 'corpus_segment.R' 'corpus_subset.R' 'corpus_trim.R' 'data-documentation.R' 'dfm-classes.R' 'dfm-methods.R' 'dfm-print.R' 'dfm-subsetting.R' 'dfm.R' 'dfm_compress.R' 'dfm_group.R' 'dfm_lookup.R' 'dfm_match.R' 'dfm_replace.R' 'dfm_sample.R' 'dfm_select.R' 'dfm_sort.R' 'dfm_subset.R' 'dfm_trim.R' 'dfm_weight.R' 'dictionaries.R' 'dictionary_edit.R' 'dimnames.R' 'directionchange-functions.R' 'fcm-classes.R' 'docnames.R' 'docvars.R' 'fcm-methods.R' 'fcm-print.R' 'fcm-subsetting.R' 'fcm.R' 'fcm_select.R' 'kwic.R' 'metadoc.R' 'nfunctions.R' 'nscrabble.R' 'nsyllable.R' 'object-builder.R' 'pattern2fixed.R' 'phrases.R' 'quanteda_options.R' 'readtext-methods.R' 'spacyr-methods.R' 'stopwords.R' 'summary.R' 'textmodel.R' 'textplot_keyness.R' 'textplot_network.R' 'textplot_wordcloud.R' 'textplot_xray.R' 'textstat-methods.R' 'textstat_collocations.R' 'textstat_entropy.R' 'textstat_frequency.R' 'textstat_keyness.R' 'textstat_lexdiv.R' 'textstat_readability.R' 'textstat_simil.R' 'textstat_summary.R' 'tokenizers.R' 'tokens-methods-base.R' 'tokens.R' 'tokens_chunk.R' 'tokens_compound.R' 'tokens_group.R' 'tokens_lookup.R' 'tokens_ngrams.R' 'tokens_replace.R' 'tokens_sample.R' 'tokens_segment.R' 'tokens_select.R' 'tokens_split.R' 'tokens_subset.R' 'utils.R' 'wordstem.R' 'zzz.R'
RoxygenNote 7.1.1
SystemRequirements C++11
NeedsCompilation yes
Packaged 2020-09-19 17:44:00 UTC; kbenoit
Repository CRAN
Date/Publication 2020-09-23 04:10:03 UTC

Include our badge in your README