Download and Load Various Text Datasets
Provides a framework to download, parse, and store text datasets
on the disk and load them when needed. Includes various sentiment lexicons
and labeled text data sets for classification and analysis.
The goal of textdata is to provide access to text-related data sets for easy access without bundling them inside a package. Some text datasets are too large to store within an R package or are licensed in such a way that prevents them from being included in an OSS-licensed package. Instead, this package provides a framework to download, parse, and store the datasets on the disk and load them when needed.
You can install the not yet released version of textdata from CRAN with:
And the development version from GitHub with:
# install.packages("remotes") remotes::install_github("EmilHvitfeldt/textdata")
The first time you use one of the functions for accessing an included
text dataset, such as
dataset_sentence_polarity(), the function will prompt you to agree
download the dataset to your computer.
After the first use, each time you use a function like
lexicon_afinn(), the function will load the dataset from disk.
Included text datasets
As of today, the datasets included in textdata are:
|v1.0 sentence polarity dataset||
|AFINN-111 sentiment lexicon||
|Hu and Liu’s opinion lexicon||
|Loughran and McDonald’s opinion lexicon for financial documents||
Check out each function’s documentation for detailed information (including citations) for the relevant dataset.
Note that this project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms. Feedback, bug reports (and fixes!), and feature requests are welcome; file issues or seek support here. For details on how to add a new dataset to this package, check out the vignette!
Functions in textdata
|catalogue||Catalogue of all available data sources|
|lexicon_bing||Bing sentiment lexicon|
|dataset_sentence_polarity||v1.0 sentence polarity dataset|
|lexicon_loughran||Loughran-McDonald sentiment lexicon|
Vignettes of textdata
Last month downloads
|License||MIT + file LICENSE|
|Collate||'dataset_sentence_polarity.R' 'lexicon_bing.R' 'lexicon_loughran.R' 'lexicon_afinn.R' 'download_functions.R' 'info.R' 'load_dataset.R' 'printer.R' 'process_functions.R'|
|Packaged||2019-06-11 16:54:10 UTC; emilhvitfeldthansen|
|Date/Publication||2019-06-12 12:20:03 UTC|
Include our badge in your README