mlr_pipeops_randomprojection: Project Numeric Features onto a Randomly Sampled Subspace

Description

Projects numeric features onto a randomly sampled subspace. All numeric features (or the ones selected by affect_columns) are replaced by numeric features PR1, PR2, ... PRn

Samples with features that contain missing values result in all PR1..PRn being NA for that sample, so it is advised to do imputation before random projections if missing values can be expected.

Arguments

Format

R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.

Construction

PipeOpRandomProjection$new(id = "randomprojection", param_vals = list())

id :: character(1) Identifier of resulting object, default "randomprojection".
param_vals :: named list List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().

Input and Output Channels

Input and output channels are inherited from PipeOpTaskPreproc.

The output is the input Task with affected numeric features projected onto a random subspace.

State

The $state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as an element $projection, a matrix.

Parameters

The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:

rank :: integer(1) The dimension of the subspace to project onto. Initialized to 1.

Internals

If there are n (affected) numeric features in the input Task, then $state$projection is a rank x m matrix. The output is calculated as input %*% state$projection.

The random projection matrix is obtained through Gram-Schmidt orthogonalization from a matrix with values standard normally distributed, which gives a distribution that is rotation invariant, as per Eaton: Multivariate Statistics, A Vector Space Approach, Pg. 234.

Methods

Only methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.

Examples

Run this code

# NOT RUN {
library("mlr3")

task = tsk("iris")
pop = po("randomprojection", rank = 2)

task$data()
pop$train(list(task))[[1]]$data()

pop$state
# }

Run the code above in your browser using DataLab