textdata v0.1.0

0

Monthly downloads

0th

Percentile

Download and Load Various Text Datasets

Provides a framework to download, parse, and store text datasets on the disk and load them when needed. Includes various sentiment lexicons and labeled text data sets for classification and analysis.

Readme

textdata

Travis build
status

The goal of textdata is to provide access to text-related data sets for easy access without bundling them inside a package. Some text datasets are too large to store within an R package or are licensed in such a way that prevents them from being included in an OSS-licensed package. Instead, this package provides a framework to download, parse, and store the datasets on the disk and load them when needed.

Installation

You can install the not yet released version of textdata from CRAN with:

install.packages("textdata")

And the development version from GitHub with:

# install.packages("remotes")
remotes::install_github("EmilHvitfeldt/textdata")

Example

The first time you use one of the functions for accessing an included text dataset, such as lexicon_afinn() or dataset_sentence_polarity(), the function will prompt you to agree that you understand the dataset’s license or terms of use and then download the dataset to your computer.

After the first use, each time you use a function like lexicon_afinn(), the function will load the dataset from disk.

Included text datasets

As of today, the datasets included in textdata are:

Dataset Function
v1.0 sentence polarity dataset dataset_sentence_polarity()
AFINN-111 sentiment lexicon lexicon_afinn()
Hu and Liu’s opinion lexicon lexicon_bing()
Loughran and McDonald’s opinion lexicon for financial documents lexicon_loughran()

Check out each function’s documentation for detailed information (including citations) for the relevant dataset.

Community Guidelines

Note that this project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms. Feedback, bug reports (and fixes!), and feature requests are welcome; file issues or seek support here. For details on how to add a new dataset to this package, check out the vignette!

Functions in textdata

Name Description
load_dataset Internal Functions
catalogue Catalogue of all available data sources
lexicon_bing Bing sentiment lexicon
dataset_sentence_polarity v1.0 sentence polarity dataset
lexicon_loughran Loughran-McDonald sentiment lexicon
lexicon_afinn AFINN-111 dataset
No Results!

Vignettes of textdata

Name
How-to-add-a-data-set.Rmd
No Results!

Last month downloads

Details

License MIT + file LICENSE
Encoding UTF-8
LazyData true
RoxygenNote 6.1.1
Collate 'dataset_sentence_polarity.R' 'lexicon_bing.R' 'lexicon_loughran.R' 'lexicon_afinn.R' 'download_functions.R' 'info.R' 'load_dataset.R' 'printer.R' 'process_functions.R'
VignetteBuilder knitr
URL https://github.com/EmilHvitfeldt/textdata
BugReports https://github.com/EmilHvitfeldt/textdata/issues
NeedsCompilation no
Packaged 2019-06-11 16:54:10 UTC; emilhvitfeldthansen
Repository CRAN
Date/Publication 2019-06-12 12:20:03 UTC

Include our badge in your README

[![Rdoc](http://www.rdocumentation.org/badges/version/textdata)](http://www.rdocumentation.org/packages/textdata)