Learn R Programming

mlr3oml (version 0.11.0)

list_oml_collections: List Data from OpenML

Description

This function allows to query data sets, tasks, flows, setups, runs, and evaluation measures from https://www.openml.org/search?type=data&sort=runs&status=active using some simple filter criteria.

To find datasets for a specific task type, use list_oml_tasks() which supports filtering according to the task type. Another heuristic to search for possible regression tasks is to search for data sets with 0 number of classes, i.e. by specifying number_classes = 0.

Usage

list_oml_collections(
  uploader = NULL,
  status = "all",
  main_entity_type = NULL,
  limit = limit_default(),
  test_server = test_server_default(),
  ...
)

list_oml_data( data_id = NULL, data_name = NULL, number_instances = NULL, number_features = NULL, number_classes = NULL, number_missing_values = NULL, tag = NULL, limit = limit_default(), test_server = test_server_default(), ... )

list_oml_evaluations( run_id = NULL, task_id = NULL, measures = NULL, tag = NULL, limit = limit_default(), test_server = test_server_default(), ... )

list_oml_flows( uploader = NULL, tag = NULL, limit = limit_default(), test_server = test_server_default(), ... )

list_oml_measures(test_server = test_server_default())

list_oml_runs( run_id = NULL, task_id = NULL, tag = NULL, flow_id = NULL, limit = limit_default(), test_server = test_server_default(), ... )

list_oml_setups( flow_id = NULL, setup_id = NULL, tag = NULL, limit = limit_default(), test_server = test_server_default(), ... )

list_oml_tasks( task_id = NULL, data_id = NULL, number_instances = NULL, number_features = NULL, number_classes = NULL, number_missing_values = NULL, tag = NULL, limit = limit_default(), test_server = test_server_default(), type = NULL, ... )

Value

(data.table()) of results, or a null data.table if no data set matches the filter criteria.

Arguments

uploader

(integer(1))
Filter for uploader.

status

(character(1))
Should be one of "active", "in_preparation", "deactivated", "all". By default "all" studies are returned.

main_entity_type

(character(1) | NULL)
Filter for main entity type. Can be "run" or "task".

limit

(integer())
Limit the results to limit records. Default is the value of option "mlr3oml.limit", defaulting to 5000.

test_server

(character(1))
Whether to use the OpenML test server or public server. Defaults to value of option "mlr3oml.test_server", or FALSE if not set.

...

(any)
Additional (unsupported) filters, as named arguments.

data_id

(integer())
Vector of data ids to restrict to.

data_name

(character(1))
Filter for name of data set.

number_instances

(integer())
Filter for number of instances.

number_features

(integer())
Filter for number of features.

number_classes

(integer())
Filter for number of labels of the target (only classification tasks).

number_missing_values

(integer())
Filter for number of missing values.

tag

(character())
Filter for tags. You can provide multiple tags as character vector.

run_id

(integer())
Vector of run ids to restrict to.

task_id

(integer())
Vector of task ids to restrict to.

measures

(character())
Vector of evaluation measures to restrict to.

flow_id

(integer(1))
Filter for flow id.

setup_id

(integer())
Vector of setup ids to restrict to.

type

(character(1))
The task type, supported values are: "clasisf", "regr", "surv" and "clust".

Details

Filter values are usually provided as single atomic values (typically integer or character). Provide a numeric vector of length 2 (c(l, u)) to find matches in the range \([l, u]\).

Note that only a subset of filters is exposed here. For a more feature-complete package, see OpenML. Alternatively, you can pass additional filters via ... using the names of the official API, c.f. the REST tab of https://www.openml.org/apis.

References

Casalicchio G, Bossek J, Lang M, Kirchhoff D, Kerschke P, Hofner B, Seibold H, Vanschoren J, Bischl B (2017). “OpenML: An R Package to Connect to the Machine Learning Platform OpenML.” Computational Statistics, 1--15. tools:::Rd_expr_doi("10.1007/s00180-017-0742-2").

Vanschoren J, van Rijn JN, Bischl B, Torgo L (2014). “OpenML.” ACM SIGKDD Explorations Newsletter, 15(2), 49--60. tools:::Rd_expr_doi("10.1145/2641190.2641198").

Examples

Run this code
# For technical reasons, examples cannot be included in this R package.
# Instead, these are some relevant resources:
#
# Large-Scale Benchmarking chapter in the mlr3book:
# https://mlr3book.mlr-org.com/chapters/chapter11/large-scale_benchmarking.html
#
# Package Article:
# https://mlr3oml.mlr-org.com/articles/tutorial.html

Run the code above in your browser using DataLab