OpenML (version 1.7)

OMLDataSet: OMLDataSet.

Description

An OMLDataSet consists of an OMLDataSetDescription, a data.frame containing the data set, the old and new column names and, finally, the target features.

The OMLDataSetDescription provides information on the data set, like the ID, name, version, etc. To see a full list of all elements, please see the XSD.

The slot colnames.old contains the original names, i.e., the column names that were uploaded to the server, while colnames.new contains the names that you will see when working with the data in R. Most of the time, old and new column names are identical. Only if the original names are not valid, the new ones will differ.

The slot target.features contains the column name(s) from the data.frame of the OMLDataSet that refer to the target feature(s).

Usage

makeOMLDataSet(desc, data, colnames.old = colnames(data),
  colnames.new = colnames(data), target.features)

Arguments

desc

[OMLDataSetDescription] Data set description.

data

[data.frame] The data set.

colnames.old

[character] Names of the features that were uploaded to the server.

colnames.new

[character] Names of the features that are displayed.

target.features

[character] Name(s) of the target feature(s).

Value

[OMLDataSet]

See Also

Other data set-related functions: OMLDataSetDescription, convertMlrTaskToOMLDataSet, convertOMLDataSetToMlr, deleteOMLObject, getOMLDataSet, listOMLDataSets, tagOMLObject, uploadOMLDataSet

Examples

Run this code
# NOT RUN {
data("airquality")
dsc = "Daily air quality measurements in New York, May to September 1973.
This data is taken from R."
cit = "Chambers, J. M., Cleveland, W. S., Kleiner, B. and Tukey, P. A. (1983) Graphical
Methods for Data Analysis. Belmont, CA: Wadsworth."
desc_airquality = makeOMLDataSetDescription(name = "airquality",
  description = dsc,
  creator = "New York State Department of Conservation (ozone data) and the National
    Weather Service (meteorological data)",
  collection.date = "May 1, 1973 to September 30, 1973",
  language = "English",
  licence = "GPL-2",
  url = "https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html",
  default.target.attribute = "Ozone",
  citation = cit,
  tags = "R")

airquality_oml = makeOMLDataSet(desc = desc_airquality,
  data = airquality,
  colnames.old = colnames(airquality),
  colnames.new = colnames(airquality),
  target.features = "Ozone")
# }

Run the code above in your browser using DataCamp Workspace