This is the class for data sets served on https://openml.org/d.
A mlr3::Task is returned by the method $task
.
Alternatively, you can convert this object to a mlr3::DataBackend using
mlr3::as_data_backend()
.
This package comes with an own reader for ARFF files, based on data.table::fread()
.
For sparse ARFF files and if the RWeka package is installed, the reader
automatically falls back to the implementation in (RWeka::read.arff()
).
id
(integer(1)
)
OpenML data id.
cache_dir
(logical(1)
| character(1)
)
Stores the location of the cache for objects retrieved from https://openml.org.
If set to FALSE
, caching is disabled.
The package qs is required for caching.
name
(character(1)
)
Name of the data set, as extracted from the data set description.
desc
(list()
)
Data set description (meta information), downloaded and converted from the JSON API response.
qualities
(data.table()
)
Data set qualities (performance values), downloaded from the JSON API response and
converted to a data.table::data.table()
with columns "name"
and "value"
.
features
(data.table()
)
Information about data set features (including target), downloaded from the JSON API response and
converted to a data.table::data.table()
with columns:
"index"
(integer()
): Column position.
"name"
(character()
): Name of the feature.
"data_type"
(factor()
): Type of the feature: "nominal"
or "numeric"
.
"nominal_value"
(list()
): Levels of the feature, or NULL
for numeric features.
"is_target"
(logical()
): TRUE
for target column, FALSE
otherwise.
"is_ignore"
(logical()
): TRUE
if this feature should be ignored.
Ignored features are removed automatically from the data set.
"is_row_identifier"
(logical()
): TRUE
if the column encodes a row identifier.
Row identifiers are removed automatically from the data set.
"number_of_missing_values"
(integer()
): Number of missing values in the column.
data
(data.table()
)
Data as data.table::data.table()
.
Columns marked as row identifiers or marked with the ignore flag are automatically removed.
target_names
(character()
)
Name of the default target, as extracted from the OpenML data set description.
feature_names
(character()
)
Name of the features, as extracted from the OpenML data set description.
nrow
(integer()
)
Number of observations, as extracted from the OpenML data set qualities.
ncol
(integer()
)
Number of features (including targets), as extracted from the table of data set features.
This excludes row identifiers and ignored columns.
tags
(character()
)
Returns all tags of the data set.
new()
Creates a new object of class OMLData
.
OMLData$new(id, cache = getOption("mlr3oml.cache", FALSE))
id
(integer(1)
)
OpenML data id.
cache
(logical(1)
| character(1)
)
See field cache
for an explanation of possible values.
Defaults to value of option "mlr3oml.cache"
, or FALSE
if not set.
print()
Prints the object.
For a more detailed printer, convert to a mlr3::Task via $task()
.
OMLData$print()
quality()
Returns the value of a single OpenML data set quality.
OMLData$quality(name)
name
(character(1)
)
Name of the quality to extract.
task()
Creates a mlr3::Task using the provided target column, defaulting to the default target attribute of the task description.
OMLData$task(target_names = NULL)
target_names
(character()
)
Name(s) of the target columns, or NULL
for the default columns.
clone()
The objects of this class are cloneable with this method.
OMLData$clone(deep = FALSE)
deep
Whether to make a deep clone.
mlr3omlvanschoren2014
# NOT RUN {
odata = OMLData$new(id = 9)
print(odata)
print(odata$target_names)
print(odata$feature_names)
print(odata$tags)
print(odata$task())
# get a task via tsk():
if (requireNamespace("mlr3")) {
mlr3::tsk("oml", data_id = 9)
}
# }
Run the code above in your browser using DataLab