list_oml: List Data from OpenML

Description

This function allows to query data sets, tasks, flows, setups, runs, and evaluation measures from https://www.openml.org/search?type=data&sort=runs&status=active using some simple filter criteria.

To find datasets for a specific task type, use list_oml_tasks() which supports filtering according to the task type. Another heuristic to search for possible regression tasks is to search for data sets with 0 number of classes, i.e. by specifying number_classes = 0.

Usage

list_oml_data(
  data_id = NULL,
  data_name = NULL,
  number_instances = NULL,
  number_features = NULL,
  number_classes = NULL,
  number_missing_values = NULL,
  tag = NULL,
  limit = limit_default(),
  test_server = test_server_default(),
  ...
)
list_oml_evaluations(
  run_id = NULL,
  task_id = NULL,
  measures = NULL,
  tag = NULL,
  limit = limit_default(),
  test_server = test_server_default(),
  ...
)
list_oml_flows(
  uploader = NULL,
  tag = NULL,
  limit = limit_default(),
  test_server = test_server_default(),
  ...
)
list_oml_measures(test_server = test_server_default())
list_oml_runs(
  run_id = NULL,
  task_id = NULL,
  tag = NULL,
  flow_id = NULL,
  limit = limit_default(),
  test_server = test_server_default(),
  ...
)
list_oml_setups(
  flow_id = NULL,
  setup_id = NULL,
  tag = NULL,
  limit = limit_default(),
  test_server = test_server_default(),
  ...
)
list_oml_tasks(
  task_id = NULL,
  data_id = NULL,
  number_instances = NULL,
  number_features = NULL,
  number_classes = NULL,
  number_missing_values = NULL,
  tag = NULL,
  limit = limit_default(),
  test_server = test_server_default(),
  type = NULL,
  ...
)

Value

(data.table()) of results, or a null data.table if no data set matches the filter criteria.

Arguments

data_id: (integer())
Vector of data ids to restrict to.
data_name: (character(1))
Filter for name of data set.
number_instances: (integer())
Filter for number of instances.
number_features: (integer())
Filter for number of features.
number_classes: (integer())
Filter for number of labels of the target (only classification tasks).
number_missing_values: (integer())
Filter for number of missing values.
tag: (character())
Filter for tags. You can provide multiple tags as character vector.
limit: (integer())
Limit the results to limit records. Default is the value of option "mlr3oml.limit", defaulting to 5000.
test_server: (character(1))
Whether to use the OpenML test server or public server. Defaults to value of option "mlr3oml.test_server", or FALSE if not set.
...: (any)
Additional (unsupported) filters, as named arguments.
run_id: (integer())
Vector of run ids to restrict to.
task_id: (integer())
Vector of task ids to restrict to.
measures: (character())
Vector of evaluation measures to restrict to.
uploader: (integer(1))
Filter for uploader.
flow_id: (integer(1))
Filter for flow id.
setup_id: (integer())
Vector of setup ids to restrict to.
type: (character(1))
The task type, supported values are: "clasisf", "regr", "surv" and "clust".

Details

Filter values are usually provided as single atomic values (typically integer or character). Provide a numeric vector of length 2 (c(l, u)) to find matches in the range \([l, u]\).

Note that only a subset of filters is exposed here. For a more feature-complete package, see OpenML. Alternatively, you can pass additional filters via ... using the names of the official API, c.f. the REST tab of https://www.openml.org/apis.

References

Casalicchio G, Bossek J, Lang M, Kirchhoff D, Kerschke P, Hofner B, Seibold H, Vanschoren J, Bischl B (2017). “OpenML: An R Package to Connect to the Machine Learning Platform OpenML.” Computational Statistics, 1--15. tools:::Rd_expr_doi("10.1007/s00180-017-0742-2").

Vanschoren J, van Rijn JN, Bischl B, Torgo L (2014). “OpenML.” ACM SIGKDD Explorations Newsletter, 15(2), 49--60. tools:::Rd_expr_doi("10.1145/2641190.2641198").

Examples

Run this code

# For technical reasons, examples cannot be included in this R package.
# Instead, these are some relevant resources:
#
# Large-Scale Benchmarking chapter in the mlr3book:
# https://mlr3book.mlr-org.com/chapters/chapter11/large-scale_benchmarking.html
#
# Package Article:
# https://mlr3oml.mlr-org.com/articles/tutorial.html

Run the code above in your browser using DataLab