vtreat (version 1.6.4)

designTreatmentsZ: Design variable treatments with no outcome variable.

Description

Data frame is assumed to have only atomic columns except for dates (which are converted to numeric). Note: each column is processed independently of all others.

Usage

designTreatmentsZ(
  dframe,
  varlist,
  ...,
  minFraction = 0,
  weights = c(),
  rareCount = 0,
  collarProb = 0,
  codeRestriction = NULL,
  customCoders = NULL,
  verbose = TRUE,
  parallelCluster = NULL,
  use_parallel = TRUE,
  missingness_imputation = NULL,
  imputation_map = NULL
)

Value

treatment plan (for use with prepare)

Arguments

dframe

Data frame to learn treatments from (training data), must have at least 1 row.

varlist

Names of columns to treat (effective variables).

...

no additional arguments, declared to forced named binding of later arguments

minFraction

optional minimum frequency a categorical level must have to be converted to an indicator column.

weights

optional training weights for each row

rareCount

optional integer, allow levels with this count or below to be pooled into a shared rare-level. Defaults to 0 or off.

collarProb

what fraction of the data (pseudo-probability) to collar data at if doCollar is set during prepare.treatmentplan.

codeRestriction

what types of variables to produce (character array of level codes, NULL means no restriction).

customCoders

map from code names to custom categorical variable encoding functions (please see https://github.com/WinVector/vtreat/blob/main/extras/CustomLevelCoders.md).

verbose

if TRUE print progress.

parallelCluster

(optional) a cluster object created by package parallel or package snow.

use_parallel

logical, if TRUE use parallel methods (if parallel cluster is set).

missingness_imputation

function of signature f(values: numeric, weights: numeric), simple missing value imputer.

imputation_map

map from column names to functions of signature f(values: numeric, weights: numeric), simple missing value imputers.

Details

The main fields are mostly vectors with names (all with the same names in the same order):

- vars : (character array without names) names of variables (in same order as names on the other diagnostic vectors) - varMoves : logical TRUE if the variable varied during hold out scoring, only variables that move will be in the treated frame

See the vtreat vignette for a bit more detail and a worked example.

Columns that do not vary are not passed through.

See Also

prepare.treatmentplan, designTreatmentsC, designTreatmentsN

Examples

Run this code

dTrainZ <- data.frame(x=c('a','a','a','a','b','b',NA,'e','e'),
    z=c(1,2,3,4,5,6,7,NA,9))
dTestZ <- data.frame(x=c('a','x','c',NA),
    z=c(10,20,30,NA))
treatmentsZ = designTreatmentsZ(dTrainZ, colnames(dTrainZ),
  rareCount=0)
dTrainZTreated <- prepare(treatmentsZ, dTrainZ)
dTestZTreated <- prepare(treatmentsZ, dTestZ)

Run the code above in your browser using DataCamp Workspace