This function allows to query data sets, tasks, flows, setups, runs, and evaluation measures from https://www.openml.org/search?type=data&sort=runs&status=active using some simple filter criteria.
To find datasets for a specific task type, use list_oml_tasks()
which supports filtering according to the task
type.
Another heuristic to search for possible regression tasks is to search for data sets with
0 number of classes, i.e. by specifying number_classes = 0
.
list_oml_data(
data_id = NULL,
data_name = NULL,
number_instances = NULL,
number_features = NULL,
number_classes = NULL,
number_missing_values = NULL,
tag = NULL,
limit = limit_default(),
test_server = test_server_default(),
...
)list_oml_evaluations(
run_id = NULL,
task_id = NULL,
measures = NULL,
tag = NULL,
limit = limit_default(),
test_server = test_server_default(),
...
)
list_oml_flows(
uploader = NULL,
tag = NULL,
limit = limit_default(),
test_server = test_server_default(),
...
)
list_oml_measures(test_server = test_server_default())
list_oml_runs(
run_id = NULL,
task_id = NULL,
tag = NULL,
flow_id = NULL,
limit = limit_default(),
test_server = test_server_default(),
...
)
list_oml_setups(
flow_id = NULL,
setup_id = NULL,
tag = NULL,
limit = limit_default(),
test_server = test_server_default(),
...
)
list_oml_tasks(
task_id = NULL,
data_id = NULL,
number_instances = NULL,
number_features = NULL,
number_classes = NULL,
number_missing_values = NULL,
tag = NULL,
limit = limit_default(),
test_server = test_server_default(),
type = NULL,
...
)
(data.table()
) of results, or a null data.table if no data set matches the filter criteria.
(integer()
)
Vector of data ids to restrict to.
(character(1)
)
Filter for name of data set.
(integer()
)
Filter for number of instances.
(integer()
)
Filter for number of features.
(integer()
)
Filter for number of labels of the target (only classification tasks).
(integer()
)
Filter for number of missing values.
(character()
)
Filter for tags. You can provide multiple tags as character vector.
(integer()
)
Limit the results to limit
records.
Default is the value of option "mlr3oml.limit"
, defaulting to 5000.
(character(1)
)
Whether to use the OpenML test server or public server.
Defaults to value of option "mlr3oml.test_server"
, or FALSE
if not set.
(any)
Additional (unsupported) filters, as named arguments.
(integer()
)
Vector of run ids to restrict to.
(integer()
)
Vector of task ids to restrict to.
(character()
)
Vector of evaluation measures to restrict to.
(integer(1)
)
Filter for uploader.
(integer(1)
)
Filter for flow id.
(integer()
)
Vector of setup ids to restrict to.
(character(1)
)
The task type, supported values are: "clasisf"
, "regr"
, "surv"
and "clust"
.
Filter values are usually provided as single atomic values (typically integer or character).
Provide a numeric vector of length 2 (c(l, u)
) to find matches in the range \([l, u]\).
Note that only a subset of filters is exposed here.
For a more feature-complete package, see OpenML.
Alternatively, you can pass additional filters via ...
using the names of the official API,
c.f. the REST tab of https://www.openml.org/apis.
Casalicchio G, Bossek J, Lang M, Kirchhoff D, Kerschke P, Hofner B, Seibold H, Vanschoren J, Bischl B (2017). “OpenML: An R Package to Connect to the Machine Learning Platform OpenML.” Computational Statistics, 1--15. tools:::Rd_expr_doi("10.1007/s00180-017-0742-2").
Vanschoren J, van Rijn JN, Bischl B, Torgo L (2014). “OpenML.” ACM SIGKDD Explorations Newsletter, 15(2), 49--60. tools:::Rd_expr_doi("10.1145/2641190.2641198").
# For technical reasons, examples cannot be included in this R package.
# Instead, these are some relevant resources:
#
# Large-Scale Benchmarking chapter in the mlr3book:
# https://mlr3book.mlr-org.com/chapters/chapter11/large-scale_benchmarking.html
#
# Package Article:
# https://mlr3oml.mlr-org.com/articles/tutorial.html
Run the code above in your browser using DataLab