mlr3oml (version 0.5.0)

list_oml: List Data from OpenML

Description

This function allows to query data sets, tasks, flows, setups, runs, and evaluation measures from https://openml.org/d using some simple filter criteria.

Usage

list_oml_data_sets(
  data_id = NULL,
  data_name = NULL,
  number_instances = NULL,
  number_features = NULL,
  number_classes = NULL,
  number_missing_values = NULL,
  tag = NULL,
  limit = getOption("mlr3oml.limit", 5000L),
  ...
)

list_oml_evaluations( run_id = NULL, task_id = NULL, measures = NULL, tag = NULL, limit = getOption("mlr3oml.limit", 5000L), ... )

list_oml_flows( uploader = NULL, tag = NULL, limit = getOption("mlr3oml.limit", 5000L), ... )

list_oml_measures()

list_oml_runs( run_id = NULL, task_id = NULL, tag = NULL, limit = getOption("mlr3oml.limit", 5000L), ... )

list_oml_setups( flow_id = NULL, setup_id = NULL, tag = NULL, limit = getOption("mlr3oml.limit", 5000L), ... )

list_oml_tasks( task_id = NULL, data_id = NULL, number_instances = NULL, number_features = NULL, number_classes = NULL, number_missing_values = NULL, tag = NULL, limit = getOption("mlr3oml.limit", 5000L), ... )

Value

(data.table()) of results, or a null data.table if no data set matches the filter criteria.

Arguments

data_id

(integer())
Vector of data ids to restrict to.

data_name

(character(1))
Filter for name of data set.

number_instances

(integer())
Filter for number of instances.

number_features

(integer())
Filter for number of features.

number_classes

(integer())
Filter for number of labels of the target (only classification tasks).

number_missing_values

(integer())
Filter for number of missing values.

tag

(character())
Filter for tags. You can provide multiple tags as character vector.

limit

(integer())
Limit the results to limit records. Default is the value of option "mlr3oml.limit", defaulting to 5000.

...

(any)
Additional (unsupported) filters, as named arguments.

run_id

(integer())
Vector of run ids to restrict to.

task_id

(integer())
Vector of task ids to restrict to.

measures

(character())
Vector of evaluation measures to restrict to.

uploader

(integer(1))
Filter for uploader.

flow_id

(integer(1))
Filter for flow id.

setup_id

(integer())
Vector of setup ids to restrict to.

Details

Filter values are usually provided as single atomic values (typically integer or character). Provide a numeric vector of length 2 (c(l, u)) to find matches in the range \([l, u]\).

Note that only a subset of filters is exposed here. For a more feature-complete package, see OpenML. Alternatively, you can pass additional filters via ... using the names of the official API, c.f. https://www.openml.org/api_docs.

References

Casalicchio G, Bossek J, Lang M, Kirchhoff D, Kerschke P, Hofner B, Seibold H, Vanschoren J, Bischl B (2017). “OpenML: An R Package to Connect to the Machine Learning Platform OpenML.” Computational Statistics, 1--15. tools:::Rd_expr_doi("10.1007/s00180-017-0742-2").

Vanschoren J, van Rijn JN, Bischl B, Torgo L (2014). “OpenML.” ACM SIGKDD Explorations Newsletter, 15(2), 49--60. tools:::Rd_expr_doi("10.1145/2641190.2641198").

Examples

Run this code
# \donttest{
### query data sets
# search for titanic data set
data_sets = list_oml_data_sets(data_name = "titanic")
print(data_sets)

# search for a reduced version
data_sets = list_oml_data_sets(
  data_name = "titanic",
  number_instances = c(2200, 2300),
  number_features = 4
)
print(data_sets)

### search tasks for this data set
tasks = list_oml_tasks(data_id = data_sets$data_id)
print(tasks)


# query runs, group by number of runs per task_id
runs = list_oml_runs(task_id = tasks$task_id)
runs[, .N, by = task_id]
# }

Run the code above in your browser using DataLab