Subsamples a `Task`

to use a fraction of the rows.

Sampling happens only during training phase. Subsampling a `Task`

may be
beneficial for training time at possibly (depending on original `Task`

size)
negligible cost of predictive performance.

`R6Class`

object inheriting from `PipeOpTaskPreproc`

/`PipeOp`

.

PipeOpSubsample$new(id = "subsample", param_vals = list())

`id`

::`character(1)`

Identifier of the resulting object, default`"subsample"`

`param_vals`

:: named`list`

List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default`list()`

.

Input and output channels are inherited from `PipeOpTaskPreproc`

.

The output during training is the input `Task`

with added or removed rows according to the sampling.
The output during prediction is the unchanged input.

The `$state`

is a named `list`

with the `$state`

elements inherited from `PipeOpTaskPreproc`

.

The parameters are the parameters inherited from `PipeOpTaskPreproc`

; however, the `affect_columns`

parameter is *not* present. Further parameters are:

`frac`

::`numeric(1)`

Fraction of rows in the`Task`

to keep. May only be greater than 1 if`replace`

is`TRUE`

. Initialized to`(1 - exp(-1)) == 0.6321`

.`stratify`

::`logical(1)`

Should the subsamples be stratified by target? Initialized to`FALSE`

. May only be`TRUE`

for`TaskClassif`

input.`replace`

::`logical(1)`

Sample with replacement? Initialized to`FALSE`

.

Uses `task$filter()`

to remove rows. If `replace`

is `TRUE`

and identical rows are added, then the `task$row_roles$use`

can *not* be used
to duplicate rows because of [inaudible]; instead the `task$rbind()`

function is used, and
a new `data.table`

is attached that contains all rows that are being duplicated exactly as many times as they are being added.

Only fields inherited from `PipeOpTaskPreproc`

/`PipeOp`

.

Only methods inherited from `PipeOpTaskPreproc`

/`PipeOp`

.

https://mlr3book.mlr-org.com/list-pipeops.html

Other PipeOps:
`PipeOpEnsemble`

,
`PipeOpImpute`

,
`PipeOpTargetTrafo`

,
`PipeOpTaskPreprocSimple`

,
`PipeOpTaskPreproc`

,
`PipeOp`

,
`mlr_pipeops_boxcox`

,
`mlr_pipeops_branch`

,
`mlr_pipeops_chunk`

,
`mlr_pipeops_classbalancing`

,
`mlr_pipeops_classifavg`

,
`mlr_pipeops_classweights`

,
`mlr_pipeops_colapply`

,
`mlr_pipeops_collapsefactors`

,
`mlr_pipeops_colroles`

,
`mlr_pipeops_copy`

,
`mlr_pipeops_datefeatures`

,
`mlr_pipeops_encodeimpact`

,
`mlr_pipeops_encodelmer`

,
`mlr_pipeops_encode`

,
`mlr_pipeops_featureunion`

,
`mlr_pipeops_filter`

,
`mlr_pipeops_fixfactors`

,
`mlr_pipeops_histbin`

,
`mlr_pipeops_ica`

,
`mlr_pipeops_imputeconstant`

,
`mlr_pipeops_imputehist`

,
`mlr_pipeops_imputelearner`

,
`mlr_pipeops_imputemean`

,
`mlr_pipeops_imputemedian`

,
`mlr_pipeops_imputemode`

,
`mlr_pipeops_imputeoor`

,
`mlr_pipeops_imputesample`

,
`mlr_pipeops_kernelpca`

,
`mlr_pipeops_learner`

,
`mlr_pipeops_missind`

,
`mlr_pipeops_modelmatrix`

,
`mlr_pipeops_multiplicityexply`

,
`mlr_pipeops_multiplicityimply`

,
`mlr_pipeops_mutate`

,
`mlr_pipeops_nmf`

,
`mlr_pipeops_nop`

,
`mlr_pipeops_ovrsplit`

,
`mlr_pipeops_ovrunite`

,
`mlr_pipeops_pca`

,
`mlr_pipeops_proxy`

,
`mlr_pipeops_quantilebin`

,
`mlr_pipeops_randomprojection`

,
`mlr_pipeops_randomresponse`

,
`mlr_pipeops_regravg`

,
`mlr_pipeops_removeconstants`

,
`mlr_pipeops_renamecolumns`

,
`mlr_pipeops_replicate`

,
`mlr_pipeops_scalemaxabs`

,
`mlr_pipeops_scalerange`

,
`mlr_pipeops_scale`

,
`mlr_pipeops_select`

,
`mlr_pipeops_smote`

,
`mlr_pipeops_spatialsign`

,
`mlr_pipeops_targetinvert`

,
`mlr_pipeops_targetmutate`

,
`mlr_pipeops_targettrafoscalerange`

,
`mlr_pipeops_textvectorizer`

,
`mlr_pipeops_threshold`

,
`mlr_pipeops_tunethreshold`

,
`mlr_pipeops_unbranch`

,
`mlr_pipeops_updatetarget`

,
`mlr_pipeops_vtreat`

,
`mlr_pipeops_yeojohnson`

,
`mlr_pipeops`

# NOT RUN { library("mlr3") pos = mlr_pipeops$get("subsample", param_vals = list(frac = 0.7, stratify = TRUE)) pos$train(list(tsk("iris"))) # }