Unlimited learning, half price | 50% off

Last chance! 50% off unlimited learning

Sale ends in


corpustools (version 0.5.1)

tCorpus: tCorpus: a corpus class for tokenized texts

Description

The tCorpus is a class for managing tokenized texts, stored as a data.frame in which each row represents a token, and columns contain the positions and features of these tokens.

Arguments

Methods and Functions

The corpustools package uses both functions and methods for working with the tCorpus.

Methods are used for all operations that modify the tCorpus itself, such as subsetting or adding columns. This allows the data to be modified by reference. Methods are accessed using the dollar sign after the tCorpus object. For example, if the tCorpus is named tc, the subset method can be called as tc$subset(...)

Functions are used for all operations that return a certain output, such as search results or a semantic network. These are used in the common R style that you know and love. For example, if the tCorpus is named tc, a semantic network can be created with semnet(tc, ...)

Overview of methods and functions

The primary goal of the tCorpus is to facilitate various corpus analysis techniques. The documentation for currently implemented techniques can be reached through the following links.

Create a tCorpusFunctions for creating a tCorpus object
Manage tCorpus dataMethods for viewing, modifying and subsetting tCorpus data
FeaturesPreprocessing, subsetting and analyzing features
Using search stringsUse Boolean queries to analyze the tCorpus
Co-occurrence networksFeature co-occurrence based semantic network analysis
Corpus comparisonCompare corpora
Topic modelingCreate and visualize topic models
Document similarityCalculate document similarity