Learn R Programming

⚠️There's a newer version (4.3.1) of this package.Take me there.

About

An R package for managing and analyzing text, created by Kenneth Benoit. Supported by the European Research Council grant ERC-2011-StG 283794-QUANTESS.

For more details, see https://quanteda.io.

How to Install

The normal way from CRAN, using your R GUI or

install.packages("quanteda") 

Or for the latest development version:

# devtools package required to install quanteda from Github 
devtools::install_github("quanteda/quanteda") 

Because this compiles some C++ and Fortran source code, you will need to have installed the appropriate compilers.

If you are using a Windows platform, this means you will need also to install the Rtools software available from CRAN.

If you are using macOS, you should install the macOS tools, namely the Clang 6.x compiler and the GNU Fortran compiler (as quanteda requires gfortran to build). If you are still getting errors related to gfortran, follow the fixes here.

How to Use

See the quick start guide to learn how to use quanteda.

How to cite

Benoit, Kenneth, Kohei Watanabe, Haiyan Wang, Paul Nulty, Adam Obeng, Stefan Müller, and Akitaka Matsuo. (2018) “quanteda: An R package for the quantitative analysis of textual data”. Journal of Open Source Software. 3(30), 774. https://doi.org/10.21105/joss.00774.

For a BibTeX entry, use the output from citation(package = "quanteda").

Leaving Feedback

If you like quanteda, please consider leaving feedback or a testimonial here.

Contributing

Contributions in the form of feedback, comments, code, and bug reports are most welcome. How to contribute:

Copy Link

Version

Install

install.packages('quanteda')

Monthly Downloads

17,344

Version

1.3.14

License

GPL-3

Maintainer

Kenneth Benoit

Last Published

November 19th, 2018

Functions in quanteda (1.3.14)

as.corpus

coerce a compressed corpus to a standard corpus
as.statistics_textmodel

Coerce various objects to statistics_textmodel
char_tolower

Convert the case of character objects
as.matrix.simil

Coerce a simil object into a matrix
corpus_trim

Remove sentences based on their token lengths or a pattern match
check_font

Check if font is available on the system
data_corpus_irishbudget2010

Irish budget speeches from 2010
as.matrix.dist_selection

Coerce a dist_selection object to a matrix
dictionary

Create a dictionary
corpus_subset

Extract a subset of a corpus
as.yaml

Convert quanteda dictionary objects to the YAML format
as.network

redefinition of network::as.network()
data_dfm_lbgexample

dfm from data in Table 1 of Laver, Benoit, and Garry (2003)
dfm_group

Combine documents in a dfm by a grouping variable
quanteda_options

Get or set package options for quanteda
attributes<-

Function extending base::attributes()
corpus_trimsentences

Remove sentences based on their token lengths or a pattern match
dfm_weight

Weight the feature frequencies in a dfm
dfm_lookup

Apply a dictionary to a dfm
tokens_wordstem

Stem the terms in an object
create

Utility function to create a object with new set of attributes
dfm_replace

Replace features in dfm
View

View methods for quanteda
convert

Convert a dfm to a non-quanteda format
is_indexed

Check if a glob pattern is indexed by index_types
as.dictionary

Coercion and checking functions for dictionary objects
generate_groups

Generate a grouping vector from docvars
fcm-class

dfm_tfidf

Weight a dfm by tf-idf
corpus-class

Base method extensions for corpus objects
docfreq

Compute the (weighted) document frequency of a feature
spacyr-methods

Extensions for and from spacy_parse objects
featnames

Get the feature labels from a dfm
fcm_sort

Sort an fcm in alphabetical order of the features
data-deprecated

Datasets with deprecated or defunct names
keyness

Compute keyness (internal functions)
affinity

Internal function to fit the likelihood scaling mixture model.
data_char_sampletext

A paragraph of text for testing various text-based functions
is_regex

Internal function for select_types() to check if a string is a regular expression
dfm2lsa

Convert a dfm to an lsa "textmatrix"
data-internal

Internal data sets
as.list.dist_selection

Coerce a dist_selection object into a list
corpus_sample

Randomly sample documents from a corpus
as.matrix.dfm

Coerce a dfm to a matrix or data.frame
print.dist_selection

Print a dist_selection object
pattern

Pattern for feature, token and keyword matching
read_dict_liwc

Import a LIWC-formatted dictionary
as.igraph

Convert an fcm to an igraph object
dfm-internal

Internal functions for dfm objects
as.list.dist

Coerce a dist object into a list
textstat_dist_old

Similarity and distance computation between documents or features
pattern2id

Convert regex and glob patterns to type IDs or fixed patterns
bootstrap_dfm

Bootstrap a dfm
predict.textmodel_wordfish

Prediction from a textmodel_wordfish method
data_dictionary_LSD2015

Lexicoder Sentiment Dictionary (2015)
predict.textmodel_wordscores

Predict textmodel_wordscores
dfm

Create a document-feature matrix
textplot_scale1d

Plot a fitted scaling model
dfm_sample

Randomly sample documents or features from a dfm
cbind.dfm

Combine dfm objects by Rows or Columns
textplot_network

Plot a network of feature co-occurrences
corpus_segment

Segment texts on a pattern match
dfm_select

Select features from a dfm or fcm
coef.textmodel_ca

Extract model coefficients from a fitted textmodel_ca object
fcm

Create a feature co-occurrence matrix
data_corpus_dailnoconf1991

Confidence debate from 1991 Irish Parliament
data_char_ukimmig2010

Immigration-related sections of 2010 UK party manifestos
nscrabble

Count the Scrabble letter values of text
corpus_reshape

Recast the document units of a corpus
data_corpus_inaugural

US presidential inaugural address texts
tokens_segment

Segment tokens object by patterns
dfm_tolower

Convert the case of the features of a dfm and combine
dfm_subset

Extract a subset of a dfm
textmodel_lsa

Latent Semantic Analysis
docnames

Get or set document names
influence.predict.textmodel_affinity

Compute feature influence from a predicted textmodel_affinity object
dictionary2-class

Print a dictionary object
metacorpus

Get or set corpus metadata
dfm_trim

Trim a dfm using frequency threshold-based feature selection
convert-wrappers

Convenience wrappers for dfm convert
escape_regex

Internal function for select_types() to escape regular expressions
tokens_replace

Replace types in tokens object
groups

Grouping variable(s) for various functions
kwic

Locate keywords-in-context
corpus

Construct a corpus object
dfm_compress

Recombine a dfm or fcm by combining identical dimension elements
list2dictionary

Internal function to convert a list to a dictionary
ntoken

Count the number of tokens or types
textstat_simil

Similarity and distance computation between documents or features
expand

Simpler and faster version of expand.grid() in base package
merge_dictionary_values

Internal function to merge values of duplicated keys
metadoc

Get or set document-level meta-data
matrix2dfm

Converts a Matrix to a dfm
sparsity

Compute the sparsity of a document-feature matrix
dfm-class

Virtual class "dfm" for a document-feature matrix
print.coefficients_textmodel

Print methods for textmodel features estimates This is a helper function used in print.summary.textmodel.
format_sparsity

format a sparsity value for printing
replace_dictionary_values

Internal function to replace dictionary values
print.dfm

Print a dfm object
nsyllable

Count syllables in a text
summary.corpus

Summarize a corpus
docvars

Get or set document-level variables
head.corpus

Return the first or last part of a corpus
pattern2list

Convert various input as pattern to a vector used in tokens_select, tokens_compound and kwic.
textmodel_nb

Naive Bayes classifier for texts
matrix2fcm

Converts a Matrix to a fcm
friendly_class_undefined_message

Print friendly object class not defined message
search_index

Internal function for select_types to search the index using fastmatch.
textmodel_wordfish

Wordfish text model
textplot_wordcloud

Plot features as a wordcloud
scrabble

Deprecated name for nscrabble
nsentence

Count the number of sentences
textplot_influence

Influence plot for text scaling models
search_glob

Select types without performing slow regex search
print.statistics_textmodel

Implements print methods for textmodel_statistics
remove_empty_keys

Utility function to remove empty keys
ndoc

Count the number of documents or features
print.phrases

Print a phrase object
textstat_keyness

Calculate keyness statistics
set_dfm_slots

Set values to a dfm's S4 slots
summary.character

summary.character method to override the network::summary.character()
tokens_lookup

Apply a dictionary to a tokens object
head.dfm

Return the first or last part of a dfm
textplot_keyness

Plot word keyness
textstat_frequency

Tabulate feature frequencies
print.textmodel_wordfish

print method for a wordfish model
tokens_recompile

recompile a serialized tokens object
phrase

Declare a compound character to be a sequence of separate pattern matches
reexports

Objects exported from other packages
quanteda-package

An R package for the quantitative analysis of textual data
message_error

Return an error message
settings

Get or set the corpus settings
summary.textmodel_nb

summary method for textmodel_nb objects
set_fcm_slots

Set values to a fcm's S4 slots
tokens_group

Recombine documents tokens by groups
textmodel_lsa-postestimation

Post-estimations methods for textmodel_lsa
print.summary.textmodel

print method for summary.textmodel
nest_dictionary

Utility function to generate a nested list
textmodel_wordscores

Wordscores text model
textmodel_affinity

Class affinity maximum likelihood text scaling model
slots<-

Function to assign multiple slots to a S4 object
summary_character

Summary statistics on a character vector
topfeatures

Identify the most frequent features in a dfm
tokens_tolower

Convert the case of tokens
textmodel_ca

Correspondence analysis of a document-feature matrix
tokens_ngrams

Create ngrams and skipgrams from tokens
textplot_xray

Plot the dispersion of key word(s)
predict.textmodel_affinity

Prediction for a fitted affinity textmodel
textstat_readability

Calculate readability
predict.textmodel_nb

Prediction from a fitted textmodel_nb object
summary.textmodel_wordfish

summary method for textmodel_wordfish
tokens_sample

Randomly sample documents from a tokens object
tokens_subset

Extract a subset of a tokens
textstat_lexdiv

Calculate lexical diversity
texts

Get or assign corpus texts
types

Get word types from a tokens object
textmodel_wordshoal

Wordshoal text model (redirect)
wordcloud

Internal function for textplot_wordcloud
tokens

Tokenize a set of texts
tokens_compound

Convert token sequences into compound tokens
textstat_select

Select rows of textstat objects by glob, regex or fixed patterns
textmodel_affinity-internal

Internal methods for textmodel_affinity
tf

deprecated name for dfm_weight
unused_dots

Raise warning of unused dots
textstat_collocations

Identify and score multi-word expressions
tfidf

tokens_select

Select or remove tokens from a tokens object
valuetype

Pattern matching using valuetype
textstat_proxy

[Experimental] Compute document/feature proximity
wordcloud_comparison

Internal function for textplot_wordcloud
tokens_tortl

[Experimental] Change direction of words in tokens
tokens_serialize

Function to serialized list-of-character tokens
as.dfm

Coercion and checking functions for dfm objects
as.corpus.corpuszip

Coerce a compressed corpus to a standard corpus
as.data.frame.dfm

Convert a dfm to a data.frame
as.dist.dist

Coerce a dist into a dist
as.tokens

Coercion, checking, and combining functions for tokens objects
as.fcm

Coercion functions for fcm objects
as.summary.textmodel

Assign the summary.textmodel class to a list
as.coefficients_textmodel

Coerce various objects to coefficients_textmodel This is a helper function used in summary.textmodel_*.
dfm_sort

Sort a dfm by frequency of one or more margins