This is the class for data sets served on https://openml.org/d.
A mlr3::Task is returned by the method $task.
Alternatively, you can convert this object to a mlr3::DataBackend using
mlr3::as_data_backend().
This package comes with an own reader for ARFF files, based on data.table::fread().
For sparse ARFF files and if the RWeka package is installed, the reader
automatically falls back to the implementation in (RWeka::read.arff()).
id(integer(1))
OpenML data id.
cache_dir(logical(1) | character(1))
Stores the location of the cache for objects retrieved from https://openml.org.
If set to FALSE, caching is disabled.
The package qs is required for caching.
name(character(1))
Name of the data set, as extracted from the data set description.
desc(list())
Data set description (meta information), downloaded and converted from the JSON API response.
qualities(data.table())
Data set qualities (performance values), downloaded from the JSON API response and
converted to a data.table::data.table() with columns "name" and "value".
features(data.table())
Information about data set features (including target), downloaded from the JSON API response and
converted to a data.table::data.table() with columns:
"index" (integer()): Column position.
"name" (character()): Name of the feature.
"data_type" (factor()): Type of the feature: "nominal" or "numeric".
"nominal_value" (list()): Levels of the feature, or NULL for numeric features.
"is_target" (logical()): TRUE for target column, FALSE otherwise.
"is_ignore" (logical()): TRUE if this feature should be ignored.
Ignored features are removed automatically from the data set.
"is_row_identifier" (logical()): TRUE if the column encodes a row identifier.
Row identifiers are removed automatically from the data set.
"number_of_missing_values" (integer()): Number of missing values in the column.
data(data.table())
Data as data.table::data.table().
Columns marked as row identifiers or marked with the ignore flag are automatically removed.
target_names(character())
Name of the default target, as extracted from the OpenML data set description.
feature_names(character())
Name of the features, as extracted from the OpenML data set description.
nrow(integer())
Number of observations, as extracted from the OpenML data set qualities.
ncol(integer())
Number of features (including targets), as extracted from the table of data set features.
This excludes row identifiers and ignored columns.
tags(character())
Returns all tags of the data set.
new()Creates a new object of class OMLData.
OMLData$new(id, cache = getOption("mlr3oml.cache", FALSE))id(integer(1))
OpenML data id.
cache(logical(1) | character(1))
See field cache for an explanation of possible values.
Defaults to value of option "mlr3oml.cache", or FALSE if not set.
print()Prints the object.
For a more detailed printer, convert to a mlr3::Task via $task().
OMLData$print()
quality()Returns the value of a single OpenML data set quality.
OMLData$quality(name)
name(character(1))
Name of the quality to extract.
task()Creates a mlr3::Task using the provided target column, defaulting to the default target attribute of the task description.
OMLData$task(target_names = NULL)
target_names(character())
Name(s) of the target columns, or NULL for the default columns.
clone()The objects of this class are cloneable with this method.
OMLData$clone(deep = FALSE)
deepWhether to make a deep clone.
mlr3omlvanschoren2014
# NOT RUN {
odata = OMLData$new(id = 9)
print(odata)
print(odata$target_names)
print(odata$feature_names)
print(odata$tags)
print(odata$task())
# get a task via tsk():
if (requireNamespace("mlr3")) {
mlr3::tsk("oml", data_id = 9)
}
# }
Run the code above in your browser using DataLab