Learn R Programming

mldr.datasets

RUMDR - R Ultimate Multilabel Dataset Repository

Installation

Use install.packages to install mldr.datasets and its dependencies:

install.packages("mldr.datasets")

Alternatively, you can install it via install_github from the devtools package.

devtools::install_github("fcharte/mldr.datasets")

You can also clone the repository by using entering git clone https://github.com/fcharte/mldr.datasets.git at your command line (assuming git is installed in your system) or with your favourite git GUI.

Usage and examples

This package provides a large collection of multilabel datasets along with the functions needed to export them to several formats and to obtain bibliographic information. Some of the datasets are integrated into the package, while others are externally available. To open a list with all the datasets integrated into the package use the following commands:

library(mldr.datasets)
data(package = "mldr.datasets")

Once the package has been loaded, any of the datasets can be queried as shown below:

birds$measures  # Obtain a list of characterization measures
flags$labels    # Retrieve information about the labels
emotions$attributes # All info about the attributes in the dataset
scene$labelsets # List of labelsets and their frequencies
cat(toBibtex(ng20)) # Print the BibTeX entry for the dataset

The external datasets are automatically donwloaded from GitHub the first time they are needed, then saved locally. To obtain a list of externally available datasets use the following commands:

library(mldr.datasets)
available.mldrs()

The external datasets are not inmediately available. To load any of them enter its name followed by empty parenthesis, as shown below:

bibtex <- bibtex()  # This will load the bibtex dataset, downloading it if is not locally available
bibtex$labels

The toBibtex S3 method returns bibliographic information about the dataset, if it is available. This can be printed with cat or copied to the clipboard to include it in your article.

For more examples and detailed explanation on available functions, please refer to the documentation.

License

This software is distributed under the following terms:

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as published
by the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License
along with this program.  If not, see <http://www.gnu.org/licenses/>.

The datasets distributed within this software are propierty of their own authors. You can find authorship and citation information inside the datasets.R file or using the toBibtex method.

Copy Link

Version

Install

install.packages('mldr.datasets')

Monthly Downloads

745

Version

0.4.2

License

LGPL (>= 3) | file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

David Charte

Last Published

January 17th, 2019

Functions in mldr.datasets (0.4.2)

corel16k009

Datasets with data from the Corel image collection. There are 10 subsets in corel16k
enron

Dataset with email messages and the folders where the users stored them
eurlexdc_test

List with 10 folds of the test data from the EUR-Lex directory codes dataset
tmc2007_500

Dataset from airplanes failures reports (500 most relevant features extracted)
toBibtex.mldr

BibTeX entry associated to an mldr object
sparsity

Calculate the sparsity level of the dataset
corel16k008

Datasets with data from the Corel image collection. There are 10 subsets in corel16k
slashdot

Dataset generated from slashdot.org site entries
eurlexdc_tra

List with 10 folds of the train data from the EUR-Lex directory codes dataset
eurlexev_test

List with 10 folds of the test data from the EUR-Lex EUROVOC descriptors dataset
imdb

Dataset generated from the IMDB film database
nuswide_VLAD

Dataset obtained from the NUS-WIDE database with cVLAD+ representation
iterative.stratification.holdout

Hold-out partitioning of an mldr object
ohsumed

Dataset generated from a subset of the Medline database
yahoo_recreation

Dataset generated from the Yahoo! web site index (recreation category)
tmc2007

Dataset from airplanes failures reports
stratified.partitions

Generic partitioning of an mldr object
stackex_philosophy

Dataset from the Stack Exchange's philosophy forum
yahoo_health

Dataset generated from the Yahoo! web site index (health category)
stackex_cs

Dataset from the Stack Exchange's computer science forum
corel16k010

Datasets with data from the Corel image collection. There are 10 subsets in corel16k
density

Calculate the density level of the dataset
bibtex

Dataset with BibTeX entries
emotions

Dataset with features extracted from music tracks and the emotions they produce
delicious

Dataset generated from the del.icio.us site bookmarks
corel5k

Dataset with data from the Corel image collection
corel16k004

Datasets with data from the Corel image collection. There are 10 subsets in corel16k
iterative.stratification.partitions

Generic partitioning of an mldr object
iterative.stratification.kfolds

Partition an mldr object into k folds
random.holdout

Hold-out partitioning of an mldr object
eurlexev_tra

List with 10 folds of the train data from the EUR-Lex EUROVOC descriptors dataset
random.kfolds

Partition an mldr object into k folds
rcv1sub2

Dataset from the Reuters corpus (subset 2)
eurlexsm_test

List with 10 folds of the test data from the EUR-Lex subject matters dataset
rcv1sub3

Dataset from the Reuters corpus (subset 3)
reutersk500

Dataset from the Reuters Corpus with the 500 most relevant features selected
mldrs

(Defunct) Obtain and show a list of additional datasets available to download
scene

Dataset from images with different natural scenes
medical

Dataset generated from medical reports
stackex_coffee

Dataset from the Stack Exchange's coffee forum
eurlexsm_tra

List with 10 folds of the train data from the EUR-Lex subject matters dataset
flags

Dataset with features correspoinding to world flags
ng20

Dataset with news messages and the news groups they belong to
stackex_cooking

Dataset from the Stack Exchange's cooking forum
yeast

Dataset with protein profiles and their categories
corel16k005

Datasets with data from the Corel image collection. There are 10 subsets in corel16k
corel16k006

Datasets with data from the Corel image collection. There are 10 subsets in corel16k
yahoo_business

Dataset generated from the Yahoo! web site index (business category)
yahoo_computers

Dataset generated from the Yahoo! web site index (computers category)
get.mldr

Get a multilabel dataset by name
genbase

Dataset with genes data and their functional expression
langlog

Dataset with data from the Language forum discussion
mediamill

Dataset with features extracted from video sequences and semantic concepts assigned as labels
random.partitions

Generic partitioning of an mldr object
rcv1sub1

Dataset from the Reuters corpus (subset 1)
yahoo_social

Dataset generated from the Yahoo! web site index (social category)
stackex_chemistry

Dataset from the Stack Exchange's chemistry forum
nuswide_BoW

Dataset obtained from the NUS-WIDE database with BoW representation
stackex_chess

Dataset from the Stack Exchange's chess forum
yahoo_society

Dataset generated from the Yahoo! web site index (society category)
rcv1sub4

Dataset from the Reuters corpus (subset 4)
stratified.holdout

Hold-out partitioning of an mldr object
stratified.kfolds

Partition an mldr object into k folds
rcv1sub5

Dataset from the Reuters corpus (subset 5)
yahoo_reference

Dataset generated from the Yahoo! web site index (reference category)
yahoo_science

Dataset generated from the Yahoo! web site index (science category)
write.mldr

Export an mldr object or set of mldr objects to different file formats
yahoo_arts

Dataset generated from the Yahoo! web site index (arts category)
yahoo_education

Dataset generated from the Yahoo! web site index (arts education)
yahoo_entertainment

Dataset generated from the Yahoo! web site index (arts entertainment)
available.mldrs

Obtain additional datasets available to download
bookmarks

Dataset with data from web bookmarks and their categories
corel16k007

Datasets with data from the Corel image collection. There are 10 subsets in corel16k
birds

Dataset with sounds produced by birds and the species they belong to
corel16k003

Datasets with data from the Corel image collection. There are 10 subsets in corel16k
cal500

Dataset with music data along with labels for emotions, instruments, genres, etc.
check_n_load.mldr

(Defunct) Check if an mldr object is locally available and download it if needed
corel16k001

Datasets with data from the Corel image collection. There are 10 subsets in corel16k
corel16k002

Datasets with data from the Corel image collection. There are 10 subsets in corel16k