Learn R Programming

⚠️There's a newer version (4.3.1) of this package.Take me there.

About

An R package for managing and analyzing text, created by Kenneth Benoit. Supported by the European Research Council grant ERC-2011-StG 283794-QUANTESS.

For more details, see https://quanteda.io.

How to Install

The normal way from CRAN, using your R GUI or

install.packages("quanteda") 

Or for the latest development version:

# devtools package required to install quanteda from Github 
devtools::install_github("quanteda/quanteda") 

Because this compiles some C++ and Fortran source code, you will need to have installed the appropriate compilers.

If you are using a Windows platform, this means you will need also to install the Rtools software available from CRAN.

If you are using macOS, you should install the macOS tools, namely the Clang 6.x compiler and the GNU Fortran compiler (as quanteda requires gfortran to build). If you are still getting errors related to gfortran, follow the fixes here.

How to Use

See the quick start guide to learn how to use quanteda.

How to cite

Benoit, Kenneth, Kohei Watanabe, Haiyan Wang, Paul Nulty, Adam Obeng, Stefan Müller, and Akitaka Matsuo. (2018) “quanteda: An R package for the quantitative analysis of textual data”. Journal of Open Source Software. 3(30), 774. https://doi.org/10.21105/joss.00774.

For a BibTeX entry, use the output from citation(package = "quanteda").

Leaving Feedback

If you like quanteda, please consider leaving feedback or a testimonial here.

Contributing

Contributions in the form of feedback, comments, code, and bug reports are most welcome. How to contribute:

Copy Link

Version

Install

install.packages('quanteda')

Monthly Downloads

27,961

Version

2.1.1

License

GPL-3

Maintainer

Kenneth Benoit

Last Published

July 27th, 2020

Functions in quanteda (2.1.1)

as.fcm

Coercion and checking functions for fcm objects
as.dfm

Coercion and checking functions for dfm objects
as.dictionary

Coercion and checking functions for dictionary objects
as.igraph

Convert an fcm to an igraph object
as.matrix,textstat_simil_sparse-method

as.matrix method for textstat_simil_sparse
as.network

redefinition of network::as.network()
as.corpus

coerce a compressed corpus to a standard corpus
View

View methods for quanteda
as.data.frame.dfm

Convert a dfm to a data.frame
cbind.dfm

Combine dfm objects by Rows or Columns
as.matrix.dfm

Coerce a dfm to a matrix or data.frame
corpus_reshape

Recast the document units of a corpus
char_tolower

Convert the case of character objects
char_select

Select or remove elements from a character vector
check_font

Check if font is available on the system
corpus

Construct a corpus object
corpus_trimsentences

Remove sentences based on their token lengths or a pattern match
create

Function to assign multiple slots to a S4 object
as.list.tokens

Coercion, checking, and combining functions for tokens objects
dfm_compress

Recombine a dfm or fcm by combining identical dimension elements
corpus_trim

Remove sentences based on their token lengths or a pattern match
dfm2lsa

Convert a dfm to an lsa "textmatrix"
compute_msttr

Compute the Mean Segmental Type-Token Ratio (MSTTR)
dfm-internal

Internal functions for dfm objects
corpus_subset

Extract a subset of a corpus
compute_lexdiv_stats

Compute lexical diversity from a dfm or tokens
dfm_tfidf

Weight a dfm by tf-idf
convert-wrappers

Convenience wrappers for dfm convert
data_dictionary_LSD2015

Lexicoder Sentiment Dictionary (2015)
dfm

Create a document-feature matrix
compute_mattr

Compute the Moving-Average Type-Token Ratio (MATTR)
as.yaml

Convert quanteda dictionary objects to the YAML format
dfm_subset

Extract a subset of a dfm
escape_regex

Internal function for select_types() to escape regular expressions
head.dfm

Return the first or last part of a dfm
expand

Simpler and faster version of expand.grid() in base package
dfm-class

Virtual class "dfm" for a document-feature matrix
merge_dictionary_values

Internal function to merge values of duplicated keys
dictionary

Create a dictionary
message_error

Return an error message
bootstrap_dfm

Bootstrap a dfm
attributes<-

Function extending base::attributes()
data-relocated

Formerly included data objects
dictionary2-class

dictionary class objects and functions
head.textstat_proxy

Return the first or last part of a textstat_proxy object
dfm_tolower

Convert the case of the features of a dfm and combine
dfm_group

Combine documents in a dfm by a grouping variable
dfm_select

Select features from a dfm or fcm
nsyllable

Count syllables in a text
data-internal

Internal data sets
corpus_sample

Randomly sample documents from a corpus
dfm_lookup

Apply a dictionary to a dfm
dfm_trim

Trim a dfm using frequency threshold-based feature selection
convert

Convert quanteda objects to non-quanteda formats
dfm_sample

Randomly sample documents or features from a dfm
get_docvars

Internal function to extract docvars
docnames

Get or set document names
data_corpus_inaugural

US presidential inaugural address texts
get_object_version

Get the package version that created an object
corpus-class

Base method extensions for corpus objects
ntoken

Count the number of tokens or types
is_glob

Check if patterns contains glob wildcard
is_indexed

Check if a glob pattern is indexed by index_types
object-builders

Object compilers
pattern

Pattern for feature, token and keyword matching
reshape_docvars

Internal function to subset or duplicate docvar rows
data_char_ukimmig2010

Immigration-related sections of 2010 UK party manifestos
corpus_segment

Segment texts on a pattern match
data_char_sampletext

A paragraph of text for testing various text-based functions
sample_bygroup

Sample a vector by a group
docvars

Get or set document-level variables
set_dfm_dimnames<-

Internal functions to set dimnames
dfm_match

Match the feature set of a dfm to given feature names
data_dfm_lbgexample

dfm from data in Table 1 of Laver, Benoit, and Garry (2003)
serialize_tokens

Function to serialize list-of-character tokens
dfm_replace

Replace features in dfm
groups

Grouping variable(s) for various functions
fcm_sort

Sort an fcm in alphabetical order of the features
head.corpus

Return the first or last part of a corpus
friendly_class_undefined_message

Print friendly object class not defined message
featfreq

Compute the frequencies of features
textplot_keyness

Plot word keyness
textplot_network

Plot a network of feature co-occurrences
generate_groups

Generate a grouping vector from docvars
textstat_readability

Calculate readability
dictionary_edit

Conveniently edit dictionaries
dfm_weight

Weight the feature frequencies in a dfm
ndoc

Count the number of documents or features
diag2na

convert same-value pairs to NA in a textstat_proxy object
textstat_select

Select rows of textstat objects by glob, regex or fixed patterns
docfreq

Compute the (weighted) document frequency of a feature
tokens_compound

Convert token sequences into compound tokens
tokens_chunk

Segment tokens object by chunks of a given size
meta_system

Internal function to get, set or initialize system metadata
make_meta

Internal functions to create a list for the meta attribute
meta

Get or set object metadata
dfm_sort

Sort a dfm by frequency of one or more margins
lowercase_dictionary_values

Internal function to lowercase dictionary values
nest_dictionary

Utility function to generate a nested list
kwic

Locate keywords-in-context
phrase

Declare a compound character to be a sequence of separate pattern matches
metadoc

Get or set document-level meta-data
format_sparsity

format a sparsity value for printing
list2dictionary

Internal function to convert a list to a dictionary
flatten_dictionary

Flatten a hierarchical dictionary into a list of character vectors
%>%

Pipe operator
names-quanteda

Special handling for names of quanteda objects
quanteda_options

Get or set package options for quanteda
quanteda-package

An R package for the quantitative analysis of textual data
spacyr-methods

Extensions for and from spacy_parse objects
sparsity

Compute the sparsity of a document-feature matrix
dfm_split_hyphenated_features

Split a dfm's hyphenated features into constituent parts
fcm-class

Virtual class "fcm" for a feature co-occurrence matrix
textmodels

Models for scaling and classification of textual data
tokens_replace

Replace tokens in a tokens object
tokens_sample

Randomly sample documents from a tokens object
reexports

Objects exported from other packages
read_dict_functions

Internal functions to import dictionary files
summary_metadata

Functions to add or retrieve corpus summary metadata
fcm

Create a feature co-occurrence matrix
featnames

Get the feature labels from a dfm
is_regex

Internal function for select_types() to check if a string is a regular expression
field_system

Shortcut functions to access or assign metadata
nscrabble

Count the Scrabble letter values of text
tokens_split

Split tokens by a separator pattern
nsentence

Count the number of sentences
tokens_subset

Extract a subset of a tokens
valuetype

Pattern matching using valuetype
remove_empty_keys

Utility function to remove empty keys
matrix2dfm

Converts a Matrix to a dfm
replace_dictionary_values

Internal function to replace dictionary values
keyness

Compute keyness (internal functions)
wordcloud

Internal function for textplot_wordcloud
pattern2list

Convert various input as pattern to a vector used in tokens_select, tokens_compound and kwic.
pattern2id

Convert regex and glob patterns to type IDs or fixed patterns
matrix2fcm

Converts a Matrix to a fcm
set_dfm_slots<-

Set values to a dfm's S4 slots
print-quanteda

Print methods for quanteda core objects
split_values

Internal function for special handling of multi-word dictionary values
set_fcm_slots<-

Set values to a fcm's S4 slots
print.phrases

Print a phrase object
search_glob

Select types without performing slow regex search
search_index

Internal function for select_types to search the index using fastmatch.
summary.corpus

Summarize a corpus
textplot_wordcloud

Plot features as a wordcloud
textstat_entropy

Compute entropies of documents or features
texts

Get or assign corpus texts
textstat_simil

Similarity and distance computation between documents or features
textstat_frequency

Tabulate feature frequencies
textstat_collocations

Identify and score multi-word expressions
textplot_xray

Plot the dispersion of key word(s)
tokenize_internal

quanteda tokenizers
textstat_proxy-class

textstat_simil/dist classes
textstat_summary

Summarize documents
textstat_proxy

[Experimental] Compute document/feature proximity
tokens

Construct a tokens object
tokens_group

Recombine documents tokens by groups
textstat_keyness

Calculate keyness statistics
tokens_segment

Segment tokens object by patterns
tokens_ngrams

Create ngrams and skipgrams from tokens
types

Get word types from a tokens object
unlist_character

Unlist a list of character vectors safely
tokens_recompile

recompile a serialized tokens object
tokens_tolower

Convert the case of tokens
textstat_lexdiv

Calculate lexical diversity
tokens_tortl

[Experimental] Change direction of words in tokens
unlist_integer

Unlist a list of integer vectors safely
unused_dots

Raise warning of unused dots
tokens_lookup

Apply a dictionary to a tokens object
wordcloud_comparison

Internal function for textplot_wordcloud
tokens_select

Select or remove tokens from a tokens object
tokens_wordstem

Stem the terms in an object
topfeatures

Identify the most frequent features in a dfm