This function allows to query data sets, tasks, flows, setups, runs, and evaluation measures from https://www.openml.org/search?type=data&sort=runs&status=active using some simple filter criteria.
To find datasets for a specific task type, use list_oml_tasks() which supports filtering according to the task
type.
Another heuristic to search for possible regression tasks is to search for data sets with
0 number of classes, i.e. by specifying number_classes = 0.
list_oml_collections(
uploader = NULL,
status = "all",
main_entity_type = NULL,
limit = limit_default(),
test_server = test_server_default(),
...
)list_oml_data(
data_id = NULL,
data_name = NULL,
number_instances = NULL,
number_features = NULL,
number_classes = NULL,
number_missing_values = NULL,
tag = NULL,
limit = limit_default(),
test_server = test_server_default(),
...
)
list_oml_evaluations(
run_id = NULL,
task_id = NULL,
measures = NULL,
tag = NULL,
limit = limit_default(),
test_server = test_server_default(),
...
)
list_oml_flows(
uploader = NULL,
tag = NULL,
limit = limit_default(),
test_server = test_server_default(),
...
)
list_oml_measures(test_server = test_server_default())
list_oml_runs(
run_id = NULL,
task_id = NULL,
tag = NULL,
flow_id = NULL,
limit = limit_default(),
test_server = test_server_default(),
...
)
list_oml_setups(
flow_id = NULL,
setup_id = NULL,
tag = NULL,
limit = limit_default(),
test_server = test_server_default(),
...
)
list_oml_tasks(
task_id = NULL,
data_id = NULL,
number_instances = NULL,
number_features = NULL,
number_classes = NULL,
number_missing_values = NULL,
tag = NULL,
limit = limit_default(),
test_server = test_server_default(),
type = NULL,
...
)
(data.table()) of results, or a null data.table if no data set matches the filter criteria.
(integer(1))
Filter for uploader.
(character(1))
Should be one of "active", "in_preparation", "deactivated", "all". By
default "all" studies are returned.
(character(1) | NULL)
Filter for main entity type. Can be "run" or "task".
(integer())
Limit the results to limit records.
Default is the value of option "mlr3oml.limit", defaulting to 5000.
(character(1))
Whether to use the OpenML test server or public server.
Defaults to value of option "mlr3oml.test_server", or FALSE if not set.
(any)
Additional (unsupported) filters, as named arguments.
(integer())
Vector of data ids to restrict to.
(character(1))
Filter for name of data set.
(integer())
Filter for number of instances.
(integer())
Filter for number of features.
(integer())
Filter for number of labels of the target (only classification tasks).
(integer())
Filter for number of missing values.
(character())
Filter for tags. You can provide multiple tags as character vector.
(integer())
Vector of run ids to restrict to.
(integer())
Vector of task ids to restrict to.
(character())
Vector of evaluation measures to restrict to.
(integer(1))
Filter for flow id.
(integer())
Vector of setup ids to restrict to.
(character(1))
The task type, supported values are: "clasisf", "regr", "surv" and "clust".
Filter values are usually provided as single atomic values (typically integer or character).
Provide a numeric vector of length 2 (c(l, u)) to find matches in the range \([l, u]\).
Note that only a subset of filters is exposed here.
For a more feature-complete package, see OpenML.
Alternatively, you can pass additional filters via ... using the names of the official API,
c.f. the REST tab of https://www.openml.org/apis.
Casalicchio G, Bossek J, Lang M, Kirchhoff D, Kerschke P, Hofner B, Seibold H, Vanschoren J, Bischl B (2017). “OpenML: An R Package to Connect to the Machine Learning Platform OpenML.” Computational Statistics, 1--15. tools:::Rd_expr_doi("10.1007/s00180-017-0742-2").
Vanschoren J, van Rijn JN, Bischl B, Torgo L (2014). “OpenML.” ACM SIGKDD Explorations Newsletter, 15(2), 49--60. tools:::Rd_expr_doi("10.1145/2641190.2641198").
# For technical reasons, examples cannot be included in this R package.
# Instead, these are some relevant resources:
#
# Large-Scale Benchmarking chapter in the mlr3book:
# https://mlr3book.mlr-org.com/chapters/chapter11/large-scale_benchmarking.html
#
# Package Article:
# https://mlr3oml.mlr-org.com/articles/tutorial.html
Run the code above in your browser using DataLab