Task
Task Class
This is the abstract base class for task objects like TaskClassif and TaskRegr.
Tasks serve two purposes:
Tasks wrap a DataBackend, an object to transparently interface different data storage types.
Tasks store meta-information, such as the role of the individual columns in the DataBackend. For example, for a classification task a single column must be marked as target column, and others as features.
Predefined (toy) tasks are stored in the Dictionary mlr_tasks,
e.g. iris
or boston_housing
.
- Keywords
- datasets
Format
R6::R6Class object.
Construction
Note: This object is typically constructed via a derived classes, e.g. TaskClassif or TaskRegr.
t = Task$new(id, task_type, backend)
id
::character(1)
Identifier for the task.task_type
::character(1)
Set in the classes which inherit from this class. Must be an element of mlr_reflections$task_types.backend
:: DataBackend Either a DataBackend, or any object which is convertible to a DataBackend withas_data_backend()
. E.g., adata.frame()
will be converted to a DataBackendDataTable.
Fields
backend
:: DataBackend.col_info
::data.table::data.table()
Table with with 3 columns:"id"
stores the name of the column."type"
holds the storage type of the variable, e.g.integer
,numeric
orcharacter
."levels"
stores a vector of distinct values (levels) for factor and character variables.
col_roles
:: namedlist()
Each column (feature) can have an arbitrary number of the following roles:"feature"
: Regular feature used in the model fitting process."target"
: Target variable."label"
: Observation labels. May be used in plots."order"
: Data returned by$data()
is ordered by this column (or these columns)."groups"
: During resampling, observations with the same value of the variable with role "groups" are marked as "belonging together". They will be exclusively assigned to be either in the training set or in the test set for each resampling iteration. Only up to one column may have this role."weights"
: Observation weights. Only up to one column may have this role.
col_roles
keeps track of the roles with a named list of vectors of feature names. To alter the roles, uset$set_col_role()
.row_roles
:: namedlist()
Each row (observation) can have an arbitrary number of roles in the learning task:"use"
: Use in train / predict / resampling."validation"
: Hold the observations back unless explicitly requested. Validation sets are not yet completely integrated into the package.
row_roles
keeps track of the roles with a named list of vectors of feature names. To alter the role, useset_row_role()
.feature_names
::character()
Return all column names withrole == "feature"
.feature_types
::data.table::data.table()
Returns a table with columnsid
andtype
whereid
are the column names of "active" features of the task andtype
is the storage type.hash
::character(1)
Hash (unique identifier) for this object.id
::character(1)
Identifier of the Task.ncol
::integer(1)
Returns the total number of cols with role "target" or "feature".nrow
::integer(1)
Return the total number of rows with role "use".row_ids
:: (integer()
|character()
) Returns the row ids of the DataBackend for observations with with role "use".target_names
::character()
Returns all column names with role "target".task_type
::character(1)
Stores the type of the Task.properties
::character()
Set of task properties. Possible properties are are stored in mlr_reflections$task_properties.groups
::data.table::data.table()
If the task has a designated column role "groups", table with two columns:row_id
(integer()
|character()
) and the grouping variablegroup
(vector()
). ReturnsNULL
if there are is no grouping column.weights
::data.table::data.table()
If the task has a designated column role "weights", table with two columns:row_id
(integer()
|character()
) and the observation weightsweight
(numeric()
). ReturnsNULL
if there are is no weight column.
Methods
data(rows = NULL, cols = NULL, data_format = NULL)
(integer()
|character()
,character(1)
,character(1)
) ->any
Returns a slice of the data from the DataBackend in the data format specified bydata_format
(depending on the DataBackend, but usually adata.table::data.table()
).Rows are additionally subsetted to only contain observations with role "use", and columns are filtered to only contain features with roles "target" and "feature". If invalid
rows
orcols
are specified, an exception is raised.formula(rhs = NULL)
character()
->stats::formula()
Constructs astats::formula()
, e.g.[target] ~ [feature_1] + [feature_2] + ... + [feature_k]
, using the features provided in argumentrhs
(defaults to all columns with role"feature"
).levels(cols = NULL)
character()
-> namedlist()
Returns the distinct values for columns referenced incols
with storage type "character", "factor" or "ordered". Argumentcols
defaults to all such columns with role"target"
or"feature"
.Note that this function ignores the row roles, it returns all levels available in the DataBackend. To update the stored level information, e.g. after filtering a task, call
$droplevels()
.droplevels(cols = NULL)
character()
->self
Updates the cache of stored factor levels, removing all levels not present in the current set of active rows.cols
defaults to all columns with storage type "character", "factor", or "ordered".missings(cols = NULL)
character()
-> namedinteger()
Returns the number of missing observations for columns referenced incols
. Considers only active rows with row role"use"
. Argumentcols
defaults to all columns with role "target" or "feature".head(n = 6)
integer()
->data.table::data.table()
Get the firstn
observations with role"use"
.set_col_role(cols, new_roles, exclusive = TRUE)
(character()
,character()
,logical(1)
) ->self
Adds the rolesnew_roles
to columns referred to bycols
. Ifexclusive
isTRUE
, the referenced columns will be removed from all other roles.set_row_role(rows, new_roles, exclusive = TRUE)
(character()
,character()
,logical(1)
) ->self
Adds the rolesnew_roles
to rows referred to byrows
. Ifexclusive
isTRUE
, the referenced rows will be removed from all other roles.filter(rows)
(integer()
|character()
) ->self
Subsets the task, reducing it to only keep the rows specified inrows
. This mutates the task in-place. See the section on task mutators for more information.select(cols)
character()
->self
Subsets the task, reducing it to only keep the features specified incols
. Note that you cannot deselect the target column, for obvious reasons. This mutates the task in-place. See the section on task mutators for more information.cbind(data)
data.frame()
->self
Extends the DataBackend with additional columns. The row ids must be provided as column indata
(with column name matching the primary key name of the DataBackend). If this column is missing, it is assumed that the rows are exactly in the order oft$row_ids
. This mutates the task in-place. See the section on task mutators for more information.rbind(data)
data.frame()
->self
Extends the DataBackend with additional rows. The new row ids must be provided as column indata
. If this column is missing, new row ids are constructed automatically. This mutates the task in-place. See the section on task mutators for more information.
S3 methods
as.data.table(t)
Task ->data.table::data.table()
Returns the complete data asdata.table::data.table()
.
Task mutators
The following methods change the task in-place:
set_row_role()
andset_col_role()
alter the row or column information inrow_roles
orcol_roles
, respectively. This provides a different "view" on the data without altering the data itself.filter()
andselect()
subset the set of active rows or features inrow_roles
orcol_roles
, respectively. This provides a different "view" on the data without altering the data itself.rbind()
andcbind()
change the task in-place by binding rows or columns to the data, but without modifying the original DataBackend. Instead, the methods first create a new DataBackendDataTable from the provided new data, and then merge both backends into an abstract DataBackend which combines the results on-demand.
See Also
Other Task: TaskClassif
,
TaskRegr
, TaskSupervised
,
mlr_generators
, mlr_tasks
Examples
# NOT RUN {
# we use the inherited class TaskClassif here,
# Class Task is not intended for direct use
task = TaskClassif$new("iris", iris, target = "Species")
task$nrow
task$ncol
task$feature_names
task$formula()
# Remove "Petal.Length"
task$set_col_role("Petal.Length", character(0L))
# Remove "Petal.Width", alternative way
task$select(setdiff(task$feature_names, "Petal.Width"))
task$feature_names
# Add new column "foo"
task$cbind(data.frame(foo = 1:150))
task$head()
# }