dataset_dbpedia: DBpedia Ontology Dataset

Description

DBpedia ontology dataset classification dataset. It contains 560,000 training samples and 70,000 testing samples for each of 14 nonoverlapping classes from DBpedia.

Usage

dataset_dbpedia(
  dir = NULL,
  split = c("train", "test"),
  delete = FALSE,
  return_path = FALSE,
  clean = FALSE,
  manual_download = FALSE
)

Value

A tibble with 560,000 or 70,000 rows for "train" and "test" respectively and 3 variables:

class: Character, denoting the class class
title: Character, title of article
description: Character, description of article

Arguments

dir: Character, path to directory where data will be stored. If NULL, user_cache_dir will be used to determine path.
split: Character. Return training ("train") data or testing ("test") data. Defaults to "train".
delete: Logical, set TRUE to delete dataset.
return_path: Logical, set TRUE to return the path of the dataset.
clean: Logical, set TRUE to remove intermediate files. This can greatly reduce the size. Defaults to FALSE.
manual_download: Logical, set TRUE if you have manually downloaded the file and placed it in the folder designated by running this function with return_path = TRUE.

Details

The classes are

Company
EducationalInstitution
Artist
Athlete
OfficeHolder
MeanOfTransportation
Building
NaturalPlace
Village
Animal
Plant
Album
Film
WrittenWork

Examples

Run this code

if (FALSE) {
dataset_dbpedia()

# Custom directory
dataset_dbpedia(dir = "data/")

# Deleting dataset
dataset_dbpedia(delete = TRUE)

# Returning filepath of data
dataset_dbpedia(return_path = TRUE)

# Access both training and testing dataset
train <- dataset_dbpedia(split = "train")
test <- dataset_dbpedia(split = "test")
}

Run the code above in your browser using DataLab

Last chance! 50% off unlimited learning