Last chance! 50% off unlimited learning
Sale ends in
The TREC dataset is dataset for question classification consisting of open-domain, fact-based questions divided into broad semantic categories. It has both a six-class (TREC-6) and a fifty-class (TREC-50) version. Both have 5,452 training examples and 500 test examples, but TREC-50 has finer-grained labels. Models are evaluated based on accuracy.
dataset_trec(dir = NULL, split = c("train", "test"), version = c("6",
"50"), delete = FALSE, return_path = FALSE, clean = FALSE)
Character, path to directory where data will be stored. If
NULL
, user_cache_dir will be used to determine path.
Character. Return training ("train") data or testing ("test") data. Defaults to "train".
Character. Version 6("6") or version 50("50"). Defaults to "6".
Logical, set TRUE
to delete dataset.
Logical, set TRUE
to return the path of the dataset.
Logical, set TRUE
to remove intermediate files. This can
greatly reduce the size. Defaults to FALSE.
A tibble with 5,452 or 500 rows for "train" and "test" respectively and 2 variables:
Character, denoting the class
Character, question text
The classes in TREC-6 are
ABBR - Abbreviation
DESC - Description and abstract concepts
ENTY - Entities
HUM - Human beings
LOC - Locations
NYM - Numeric values
the classes in TREC-50 can be found here https://cogcomp.seas.upenn.edu/Data/QA/QC/definition.html.
Other topic: dataset_ag_news
,
dataset_dbpedia
# NOT RUN {
dataset_trec()
# Custom directory
dataset_trec(dir = "data/")
# Deleting dataset
dataset_trec(delete = TRUE)
# Returning filepath of data
dataset_trec(return_path = TRUE)
# Access both training and testing dataset
train_6 <- dataset_trec(split = "train")
test_6 <- dataset_trec(split = "test")
train_50 <- dataset_trec(split = "train", version = "50")
test_50 <- dataset_trec(split = "test", version = "50")
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab