quanteda v1.5.2


Monthly downloads



Quantitative Analysis of Textual Data

A fast, flexible, and comprehensive framework for quantitative text analysis in R. Provides functionality for corpus management, creating and manipulating tokens and ngrams, exploring keywords in context, forming and manipulating sparse matrices of documents by features and feature co-occurrences, analyzing keywords, computing feature similarities and distances, applying content dictionaries, applying supervised and unsupervised machine learning, visually representing text and text analyses, and more.


quanteda: quantitative analysis of textual

Version Downloads Total
Downloads Travis-CI Build
Status Appveyor Build
status codecov DOI DOI


An R package for managing and analyzing text, created by Kenneth Benoit. Supported by the European Research Council grant ERC-2011-StG 283794-QUANTESS.

For more details, see https://quanteda.io.

How to Install

The normal way from CRAN, using your R GUI or


Or for the latest development version:

# devtools package required to install quanteda from Github 

Because this compiles some C++ and Fortran source code, you will need to have installed the appropriate compilers.

If you are using a Windows platform, this means you will need also to install the Rtools software available from CRAN.

If you are using macOS, you should install the macOS tools, namely the Clang 6.x compiler and the GNU Fortran compiler (as quanteda requires gfortran to build). If you are still getting errors related to gfortran, follow the fixes here.

How to Use

See the quick start guide to learn how to use quanteda.

How to cite

Benoit, Kenneth, Kohei Watanabe, Haiyan Wang, Paul Nulty, Adam Obeng, Stefan Müller, and Akitaka Matsuo. (2018) “quanteda: An R package for the quantitative analysis of textual data”. Journal of Open Source Software. 3(30), 774. https://doi.org/10.21105/joss.00774.

For a BibTeX entry, use the output from citation(package = "quanteda").

Leaving Feedback

If you like quanteda, please consider leaving feedback or a testimonial here.


Contributions in the form of feedback, comments, code, and bug reports are most welcome. How to contribute:

Functions in quanteda

Name Description
as.corpus coerce a compressed corpus to a standard corpus
as.coefficients_textmodel Coerce various objects to coefficients_textmodel This is a helper function used in summary.textmodel_*.
as.igraph Convert an fcm to an igraph object
as.data.frame.dfm Convert a dfm to a data.frame
as.corpus.corpuszip Coerce a compressed corpus to a standard corpus
as.matrix.dfm Coerce a dfm to a matrix or data.frame
as.list.dist_selection Coerce a dist_selection object into a list
as.tokens Coercion, checking, and combining functions for tokens objects
as.dfm Coercion and checking functions for dfm objects
as.dictionary Coercion and checking functions for dictionary objects
as.list.dist Coerce a dist object into a list
as.yaml Convert quanteda dictionary objects to the YAML format
attributes<- Function extending base::attributes()
cbind.dfm Combine dfm objects by Rows or Columns
bootstrap_dfm Bootstrap a dfm
char_tolower Convert the case of character objects
corpus Construct a corpus object
as.summary.textmodel Assign the summary.textmodel class to a list
compute_mattr Compute the Moving-Average Type-Token Ratio (MATTR)
as.statistics_textmodel Coerce various objects to statistics_textmodel
compute_lexdiv_stats Compute lexical diversity from a dfm or tokens
corpus_reshape Recast the document units of a corpus
corpus_subset Extract a subset of a corpus
affinity Internal function to fit the likelihood scaling mixture model.
View View methods for quanteda
as.matrix,textstat_simil_sparse-method as.matrix method for textstat_simil_sparse
data_corpus_dailnoconf1991 Confidence debate from 1991 Irish Parliament
as.network redefinition of network::as.network()
compute_msttr Compute the Mean Segmental Type-Token Ratio (MSTTR)
convert Convert a dfm to a non-quanteda format
data_char_sampletext A paragraph of text for testing various text-based functions
corpus-class Base method extensions for corpus objects
data_corpus_inaugural US presidential inaugural address texts
corpus_trim Remove sentences based on their token lengths or a pattern match
data_char_ukimmig2010 Immigration-related sections of 2010 UK party manifestos
dfm-internal Internal functions for dfm objects
dfm Create a document-feature matrix
dfm_weight Weight the feature frequencies in a dfm
check_font Check if font is available on the system
diag2na convert same-value pairs to NA in a textstat_proxy object
coef.textmodel_ca Extract model coefficients from a fitted textmodel_ca object
convert-wrappers Convenience wrappers for dfm convert
data_dictionary_LSD2015 Lexicoder Sentiment Dictionary (2015)
dfm-class Virtual class "dfm" for a document-feature matrix
data_dfm_lbgexample dfm from data in Table 1 of Laver, Benoit, and Garry (2003)
data_corpus_irishbudget2010 Irish budget speeches from 2010
corpus_trimsentences Remove sentences based on their token lengths or a pattern match
fcm Create a feature co-occurrence matrix
fcm_sort Sort an fcm in alphabetical order of the features
head.dfm Return the first or last part of a dfm
create Utility function to create a object with new set of attributes
dfm_tfidf Weight a dfm by tf-idf
corpus_segment Segment texts on a pattern match
dfm_subset Extract a subset of a dfm
corpus_sample Randomly sample documents from a corpus
dfm_group Combine documents in a dfm by a grouping variable
dfm_lookup Apply a dictionary to a dfm
friendly_class_undefined_message Print friendly object class not defined message
docvars Get or set document-level variables
escape_regex Internal function for select_types() to escape regular expressions
data-internal Internal data sets
data-deprecated Datasets with deprecated or defunct names
dfm2lsa Convert a dfm to an lsa "textmatrix"
generate_groups Generate a grouping vector from docvars
matrix2dfm Converts a Matrix to a dfm
head.textstat_proxy Return the first or last part of a textstat_proxy object
dfm_match Match the feature set of a dfm to given feature names
dfm_compress Recombine a dfm or fcm by combining identical dimension elements
dfm_sort Sort a dfm by frequency of one or more margins
dfm_split_hyphenated_features Split a dfm's hyphenated features into constituent parts
keyness Compute keyness (internal functions)
matrix2fcm Converts a Matrix to a fcm
metacorpus Get or set corpus metadata
dfm_replace Replace features in dfm
kwic Locate keywords-in-context
dictionary2-class Coerce a dictionary object into a list
dictionary Create a dictionary
dfm_select Select features from a dfm or fcm
dfm_sample Randomly sample documents or features from a dfm
featfreq Compute the frequencies of features
dfm_tolower Convert the case of the features of a dfm and combine
dfm_trim Trim a dfm using frequency threshold-based feature selection
docfreq Compute the (weighted) document frequency of a feature
docnames Get or set document names
predict.textmodel_nb Prediction from a fitted textmodel_nb object
metadoc Get or set document-level meta-data
groups Grouping variable(s) for various functions
head.corpus Return the first or last part of a corpus
nsentence Count the number of sentences
ntoken Count the number of tokens or types
nsyllable Count syllables in a text
pattern Pattern for feature, token and keyword matching
print.dfm Print a dfm object
featnames Get the feature labels from a dfm
expand Simpler and faster version of expand.grid() in base package
predict.textmodel_wordfish Prediction from a textmodel_wordfish method
flatten_dictionary Flatten a hierarchical dictionary into a list of character vectors
format_sparsity format a sparsity value for printing
fcm-class Virtual class "fcm" for a feature co-occurrence matrix The fcm class of object is a special type of fcm object with additional slots, described below.
print.phrases Print a phrase object
is_regex Internal function for select_types() to check if a string is a regular expression
print.statistics_textmodel Implements print methods for textmodel_statistics
is_indexed Check if a glob pattern is indexed by index_types
list2dictionary Internal function to convert a list to a dictionary
search_glob Select types without performing slow regex search
search_index Internal function for select_types to search the index using fastmatch.
lowercase_dictionary_values Internal function to lowercase dictionary values
influence.predict.textmodel_affinity Compute feature influence from a predicted textmodel_affinity object
is_glob Check if patterns contains glob wildcard
summary.textmodel_nb summary method for textmodel_nb objects
set_fcm_slots Set values to a fcm's S4 slots
spacyr-methods Extensions for and from spacy_parse objects
slots<- Function to assign multiple slots to a S4 object
textmodel_wordfish Wordfish text model
textmodel_nb Naive Bayes classifier for texts
settings Get or set the corpus settings
print.dist_selection Print a dist_selection object
nfeature Defunct form of nfeat
summary.textmodel_wordfish summary method for textmodel_wordfish
merge_dictionary_values Internal function to merge values of duplicated keys
textstat_frequency Tabulate feature frequencies
textstat_simil Similarity and distance computation between documents or features
textstat_entropy Compute entropy of documents or features
phrase Declare a compound character to be a sequence of separate pattern matches
predict.textmodel_affinity Prediction for a fitted affinity textmodel
message_error Return an error message
textstat_dist_old Similarity and distance computation between documents or features
textmodel_affinity Class affinity maximum likelihood text scaling model
tokens_sample Randomly sample documents from a tokens object
tokens_segment Segment tokens object by patterns
tokens_tolower Convert the case of tokens
tokens_tortl [Experimental] Change direction of words in tokens
textmodel_ca Correspondence analysis of a document-feature matrix
texts Get or assign corpus texts
ndoc Count the number of documents or features
nscrabble Count the Scrabble letter values of text
nest_dictionary Utility function to generate a nested list
predict.textmodel_wordscores Predict textmodel_wordscores
pattern2id Convert regex and glob patterns to type IDs or fixed patterns
read_dict_functions Internal functions to import dictionary files
textstat_collocations Identify and score multi-word expressions
pattern2list Convert various input as pattern to a vector used in tokens_select, tokens_compound and kwic.
print.coefficients_textmodel Print methods for textmodel features estimates This is a helper function used in print.summary.textmodel.
tokens_select Select or remove tokens from a tokens object
tokens_serialize Function to serialized list-of-character tokens
textstat_select Select rows of textstat objects by glob, regex or fixed patterns
tokens_split Split tokens by a separator pattern
textstat_readability Calculate readability
valuetype Pattern matching using valuetype
tokens_subset Extract a subset of a tokens
print.summary.textmodel print method for summary.textmodel
set_dfm_dimnames<- Internal functions to set dimnames
print.textmodel_wordfish print method for a wordfish model
reexports Objects exported from other packages
set_dfm_slots Set values to a dfm's S4 slots
quanteda-package An R package for the quantitative analysis of textual data
sample_bygroup Sample a vector by a group
scrabble Deprecated name for nscrabble
wordcloud Internal function for textplot_wordcloud
quanteda_options Get or set package options for quanteda
replace_dictionary_values Internal function to replace dictionary values
remove_empty_keys Utility function to remove empty keys
summary_character Summary statistics on a character vector
textmodel_affinity-internal Internal methods for textmodel_affinity
sparsity Compute the sparsity of a document-feature matrix
textplot_influence Influence plot for text scaling models
split_values Internal function for special handling of multi-word dictionary values
textplot_keyness Plot word keyness
summary.character summary.character method to override the network::summary.character()
summary.corpus Summarize a corpus
textmodel_wordscores Wordscores text model
textstat_lexdiv Calculate lexical diversity
textstat_keyness Calculate keyness statistics
textplot_wordcloud Plot features as a wordcloud
tf deprecated name for dfm_weight
textplot_xray Plot the dispersion of key word(s)
textmodel_lsa-postestimation Post-estimations methods for textmodel_lsa
textmodel_wordshoal Wordshoal text model (redirect)
tfidf Deprecated form of dfm_tfidf
textmodel_lsa Latent Semantic Analysis
tokens_compound Convert token sequences into compound tokens
tokens Tokenize a set of texts
tokens_chunk Segment tokens object by chunks of a given size
tokens_lookup Apply a dictionary to a tokens object
tokens_ngrams Create ngrams and skipgrams from tokens
unused_dots Raise warning of unused dots
unlist_integer Unlist a list of integer vectors safely
unlist_character Unlist a list of character vectors safely
tokens_group Recombine documents tokens by groups
types Get word types from a tokens object
textplot_network Plot a network of feature co-occurrences
textplot_scale1d Plot a fitted scaling model
textstat_proxy [Experimental] Compute document/feature proximity
textstat_proxy-class textstat_simil/dist classes
tokens_recompile recompile a serialized tokens object
topfeatures Identify the most frequent features in a dfm
tokens_replace Replace tokens in a tokens object
tokens_wordstem Stem the terms in an object
wordcloud_comparison Internal function for textplot_wordcloud
as.dist.dist Coerce a dist into a dist
as.fcm Coercion functions for fcm objects
as.matrix.dist_selection Coerce a dist_selection object to a matrix
as.matrix.simil Coerce a simil object into a matrix
No Results!

Vignettes of quanteda

No Results!

Last month downloads


License GPL-3
LinkingTo Rcpp, RcppParallel, RcppArmadillo (>= 0.7.600.1.0)
URL https://quanteda.io
Encoding UTF-8
BugReports https://github.com/quanteda/quanteda/issues
LazyData TRUE
VignetteBuilder knitr
Language en-GB
Collate 'RcppExports.R' 'View.R' 'bootstrap_dfm.R' 'casechange-functions.R' 'directionchange-functions.R' 'character-methods.R' 'convert.R' 'corpus-methods-base.R' 'corpus-methods-quanteda.R' 'corpus-methods-tm.R' 'corpus.R' 'corpus_reshape.R' 'corpus_sample.R' 'corpus_segment.R' 'corpus_subset.R' 'corpus_trim.R' 'corpuszip.R' 'data-deprecated.R' 'data-documentation.R' 'defunct-functions.R' 'dfm-classes.R' 'dfm-methods.R' 'dfm-print.R' 'dfm-subsetting.R' 'dfm.R' 'dfm_compress.R' 'dfm_group.R' 'dfm_lookup.R' 'dfm_replace.R' 'dfm_sample.R' 'dfm_select.R' 'dfm_sort.R' 'dfm_subset.R' 'dfm_trim.R' 'dfm_weight.R' 'dfm_match.R' 'dictionaries.R' 'docnames.R' 'dimnames.R' 'docvars.R' 'fcm-classes.R' 'fcm-methods.R' 'fcm-subsetting.R' 'fcm.R' 'future.R' 'kwic.R' 'nfunctions.R' 'nscrabble.R' 'nsyllable.R' 'pattern2fixed.R' 'phrases.R' 'quanteda-documentation.R' 'quanteda_options.R' 'readtext-methods.R' 'settings.R' 'spacyr-methods.R' 'stopwords.R' 'textmodel-methods.R' 'textmodel_affinity.R' 'textmodel_ca.R' 'textmodel_lsa.R' 'textmodel_nb.R' 'textmodel_wordfish.R' 'textmodel_wordscores.R' 'textplot_influence.R' 'textplot_keyness.R' 'textplot_network.R' 'textplot_scale1d.R' 'textplot_wordcloud.R' 'textplot_xray.R' 'textstat-methods.R' 'textstat_collocations.R' 'textstat_dist_old.R' 'textstat_frequency.R' 'textstat_keyness.R' 'textstat_lexdiv.R' 'textstat_readability.R' 'textstat_simil.R' 'textstat_simil_old.R' 'textstat_entropy.R' 'tokens.R' 'tokens_compound.R' 'tokens_group.R' 'tokens_lookup.R' 'tokens_ngrams.R' 'tokens_replace.R' 'tokens_segment.R' 'tokens_select.R' 'tokens_subset.R' 'tokens_sample.R' 'tokens_split.R' 'tokens_chunk.R' 'utils.R' 'wordstem.R' 'zzz.R'
RoxygenNote 7.0.1
SystemRequirements C++11
NeedsCompilation yes
Packaged 2019-11-25 20:31:02 UTC; kbenoit
Repository CRAN
Date/Publication 2019-11-26 06:40:03 UTC

Include our badge in your README