quanteda package - RDocumentation

Learn R Programming

⚠️There's a newer version (4.3.1) of this package.Take me there.

About

An R package for managing and analyzing text, created by Kenneth Benoit. Supported by the European Research Council grant ERC-2011-StG 283794-QUANTESS.

For more details, see https://quanteda.io.

How to Install

The normal way from CRAN, using your R GUI or

install.packages("quanteda")

Or for the latest development version:

# devtools package required to install quanteda from Github 
devtools::install_github("quanteda/quanteda")

Because this compiles some C++ and Fortran source code, you will need to have installed the appropriate compilers.

If you are using a Windows platform, this means you will need also to install the Rtools software available from CRAN.

If you are using macOS, you should install the macOS tools, namely the Clang 6.x compiler and the GNU Fortran compiler (as quanteda requires gfortran to build). If you are still getting errors related to gfortran, follow the fixes here.

How to Use

See the quick start guide to learn how to use quanteda.

How to cite

Benoit, Kenneth, Kohei Watanabe, Haiyan Wang, Paul Nulty, Adam Obeng, Stefan Müller, and Akitaka Matsuo. (2018) “quanteda: An R package for the quantitative analysis of textual data”. Journal of Open Source Software. 3(30), 774. https://doi.org/10.21105/joss.00774.

For a BibTeX entry, use the output from citation(package = "quanteda").

Leaving Feedback

If you like quanteda, please consider leaving feedback or a testimonial here.

Contributing

Contributions in the form of feedback, comments, code, and bug reports are most welcome. How to contribute:

Fork the source code, modify, and issue a pull request through the project GitHub page. See our Contributor Code of Conduct and the all-important quanteda Style Guide.
Issues, bug reports, and wish lists: File a GitHub issue.
Usage questions: Submit a question on the quanteda channel on StackOverflow.
Contact the maintainer by email.

Copy Link

Version

Install

install.packages('quanteda')

Monthly Downloads

27,961

Version

2.1.1

License

GPL-3

Maintainer

Kenneth Benoit

Last Published

July 27th, 2020

Functions in quanteda (2.1.1)

Coercion and checking functions for fcm objects

Coercion and checking functions for dfm objects

Coercion and checking functions for dictionary objects

Convert an fcm to an igraph object

as.matrix,textstat_simil_sparse-method

as.matrix method for textstat_simil_sparse

redefinition of network::as.network()

coerce a compressed corpus to a standard corpus

View methods for quanteda

as.data.frame.dfm

Convert a dfm to a data.frame

Combine dfm objects by Rows or Columns

Coerce a dfm to a matrix or data.frame

Recast the document units of a corpus

Convert the case of character objects

Select or remove elements from a character vector

Check if font is available on the system

Construct a corpus object

corpus_trimsentences

Remove sentences based on their token lengths or a pattern match

Function to assign multiple slots to a S4 object

Coercion, checking, and combining functions for tokens objects

Recombine a dfm or fcm by combining identical dimension elements

Remove sentences based on their token lengths or a pattern match

Convert a dfm to an lsa "textmatrix"

Compute the Mean Segmental Type-Token Ratio (MSTTR)

Internal functions for dfm objects

Extract a subset of a corpus

compute_lexdiv_stats

Compute lexical diversity from a dfm or tokens

Weight a dfm by tf-idf

convert-wrappers

Convenience wrappers for dfm convert

data_dictionary_LSD2015

Lexicoder Sentiment Dictionary (2015)

Create a document-feature matrix

Compute the Moving-Average Type-Token Ratio (MATTR)

Convert quanteda dictionary objects to the YAML format

Extract a subset of a dfm

Internal function for select_types() to escape regular expressions

Return the first or last part of a dfm

Simpler and faster version of expand.grid() in base package

Virtual class "dfm" for a document-feature matrix

merge_dictionary_values

Internal function to merge values of duplicated keys

Create a dictionary

Return an error message

Bootstrap a dfm

Function extending base::attributes()

Formerly included data objects

dictionary2-class

dictionary class objects and functions

head.textstat_proxy

Return the first or last part of a textstat_proxy object

Convert the case of the features of a dfm and combine

Combine documents in a dfm by a grouping variable

Select features from a dfm or fcm

Count syllables in a text

Internal data sets

Randomly sample documents from a corpus

Apply a dictionary to a dfm

Trim a dfm using frequency threshold-based feature selection

Convert quanteda objects to non-quanteda formats

Randomly sample documents or features from a dfm

Internal function to extract docvars

Get or set document names

data_corpus_inaugural

US presidential inaugural address texts

get_object_version

Get the package version that created an object

Base method extensions for corpus objects

Count the number of tokens or types

Check if patterns contains glob wildcard

Check if a glob pattern is indexed by index_types

object-builders

Object compilers

Pattern for feature, token and keyword matching

reshape_docvars

Internal function to subset or duplicate docvar rows

data_char_ukimmig2010

Immigration-related sections of 2010 UK party manifestos

Segment texts on a pattern match

data_char_sampletext

A paragraph of text for testing various text-based functions

Sample a vector by a group

Get or set document-level variables

set_dfm_dimnames<-

Internal functions to set dimnames

Match the feature set of a dfm to given feature names

data_dfm_lbgexample

dfm from data in Table 1 of Laver, Benoit, and Garry (2003)

serialize_tokens

Function to serialize list-of-character tokens

Replace features in dfm

Grouping variable(s) for various functions

Sort an fcm in alphabetical order of the features

Return the first or last part of a corpus

friendly_class_undefined_message

Print friendly object class not defined message

Compute the frequencies of features

textplot_keyness

Plot word keyness

textplot_network

Plot a network of feature co-occurrences

generate_groups

Generate a grouping vector from docvars

textstat_readability

Calculate readability

dictionary_edit

Conveniently edit dictionaries

Weight the feature frequencies in a dfm

Count the number of documents or features

convert same-value pairs to NA in a textstat_proxy object

textstat_select

Select rows of textstat objects by glob, regex or fixed patterns

Compute the (weighted) document frequency of a feature

tokens_compound

Convert token sequences into compound tokens

Segment tokens object by chunks of a given size

Internal function to get, set or initialize system metadata

Internal functions to create a list for the meta attribute

Get or set object metadata

Sort a dfm by frequency of one or more margins

lowercase_dictionary_values

Internal function to lowercase dictionary values

nest_dictionary

Utility function to generate a nested list

Locate keywords-in-context

Declare a compound character to be a sequence of separate pattern matches

Get or set document-level meta-data

format_sparsity

format a sparsity value for printing

list2dictionary

Internal function to convert a list to a dictionary

flatten_dictionary

Flatten a hierarchical dictionary into a list of character vectors

Special handling for names of quanteda objects

quanteda_options

Get or set package options for quanteda

quanteda-package

An R package for the quantitative analysis of textual data

Extensions for and from spacy_parse objects

Compute the sparsity of a document-feature matrix

dfm_split_hyphenated_features

Split a dfm's hyphenated features into constituent parts

Virtual class "fcm" for a feature co-occurrence matrix

Models for scaling and classification of textual data

Replace tokens in a tokens object

Randomly sample documents from a tokens object

Objects exported from other packages

read_dict_functions

Internal functions to import dictionary files

summary_metadata

Functions to add or retrieve corpus summary metadata

Create a feature co-occurrence matrix

Get the feature labels from a dfm

Internal function for select_types() to check if a string is a regular expression

Shortcut functions to access or assign metadata

Count the Scrabble letter values of text

Split tokens by a separator pattern

Count the number of sentences

Extract a subset of a tokens

Pattern matching using valuetype

remove_empty_keys

Utility function to remove empty keys

Converts a Matrix to a dfm

replace_dictionary_values

Internal function to replace dictionary values

Compute keyness (internal functions)

Internal function for textplot_wordcloud

Convert various input as pattern to a vector used in tokens_select, tokens_compound and kwic.

Convert regex and glob patterns to type IDs or fixed patterns

Converts a Matrix to a fcm

set_dfm_slots<-

Set values to a dfm's S4 slots

Print methods for quanteda core objects

Internal function for special handling of multi-word dictionary values

set_fcm_slots<-

Set values to a fcm's S4 slots

Print a phrase object

Select types without performing slow regex search

Internal function for select_types to search the index using fastmatch.

Summarize a corpus

textplot_wordcloud

Plot features as a wordcloud

textstat_entropy

Compute entropies of documents or features

Get or assign corpus texts

Similarity and distance computation between documents or features

textstat_frequency

Tabulate feature frequencies

textstat_collocations

Identify and score multi-word expressions

Plot the dispersion of key word(s)

tokenize_internal

quanteda tokenizers

textstat_proxy-class

textstat_simil/dist classes

textstat_summary

Summarize documents

[Experimental] Compute document/feature proximity

Construct a tokens object

Recombine documents tokens by groups

textstat_keyness

Calculate keyness statistics

Segment tokens object by patterns

Create ngrams and skipgrams from tokens

Get word types from a tokens object

unlist_character

Unlist a list of character vectors safely

tokens_recompile

recompile a serialized tokens object

Convert the case of tokens

textstat_lexdiv

Calculate lexical diversity

[Experimental] Change direction of words in tokens

Unlist a list of integer vectors safely

Raise warning of unused dots

Apply a dictionary to a tokens object

wordcloud_comparison

Internal function for textplot_wordcloud

Select or remove tokens from a tokens object

tokens_wordstem

Stem the terms in an object

Identify the most frequent features in a dfm