Learn R Programming

⚠️There's a newer version (0.3-4) of this package.Take me there.

tosca (version 0.3-2)

Tools for Statistical Content Analysis

Description

A framework for statistical analysis in content analysis. In addition to a pipeline for preprocessing text corpora and linking to the latent Dirichlet allocation from the 'lda' package, plots are offered for the descriptive analysis of text corpora and topic models. In addition, an implementation of Chang's intruder words and intruder topics is provided. Sample data for the vignette is included in the toscaData package, which is available on gitHub: .

Copy Link

Version

Install

install.packages('tosca')

Monthly Downloads

561

Version

0.3-2

License

GPL (>= 2)

Issues

Pull Requests

Stars

Forks

Maintainer

Lars Koppers

Last Published

October 28th, 2021

Functions in tosca (0.3-2)

deleteAndRenameDuplicates

Deletes and Renames Articles with the same ID
as.meta

"meta" Component of "textmeta"-Objects
LDAgen

Function to fit LDA model
clusterTopics

Cluster Analysis
as.textmeta.corpus

Transform corpus to textmeta
cleanTexts

Data Preprocessing
LDAprep

Create Lda-ready Dataset
as.corpus.textmeta

Transform textmeta to corpus
filterCount

Subcorpus With Count Filter
duplist

Creating List of Duplicates
intruderWords

Function to validate the fit of the LDA model
filterWord

Subcorpus With Word Filter
plotFreq

Plotting Counts of specified Wordgroups over Time (relative to Corpus)
plotArea

Plotting topics over time as stacked areas below plotted lines.
mergeLDA

Preparation of Different LDAs For Clustering
filterID

Subcorpus With ID Filter
readWiki

Read Pages from Wikipedia
filterDate

Subcorpus With Date Filter
readWhatsApp

Read WhatsApp files
topWords

Top Words per Topic
topicCoherence

Calculating Topic Coherence
intruderTopics

Function to validate the fit of the LDA model
plotHeat

Plotting Topics over Time relative to Corpus
plotScot

Plots Counts of Documents or Words over Time (relative to Corpus)
makeWordlist

Counts Words in Text Corpora
precision

Precision and Recall
mergeTextmeta

Merge Textmeta Objects
plotWordpt

Plots Counts of Topics-Words-Combination over Time (Relative to Topics)
plotWordSub

Plotting Counts/Proportion of Words/Docs in LDA-generated Topic-Subcorpora over Time
readTextmeta

Read Corpora as CSV
showTexts

Exports Readable Text Lists
tidy.textmeta

Transform textmeta to an object with tidy text data
textmeta

"textmeta"-Objects
readWikinews

Read files from Wikinews
plotTopic

Plotting Counts of Topics over Time (Relative to Corpus)
topTexts

Get The IDs Of The Most Representive Texts
removeXML

Removes XML/HTML Tags and Umlauts
plotTopicWord

Plotting Counts of Topics-Words-Combination over Time (Relative to Words)
sampling

Sample Texts
showMeta

Export Readable Meta-Data of Articles.
topicsInText

Coloring the words of a text corresponding to topic allocation