mkCrossFrameMExperiment: Function to build multi-outcome vtreat cross frame and treatment plan.

Description

Please see vignette("MultiClassVtreat", package = "vtreat") https://winvector.github.io/vtreat/articles/MultiClassVtreat.html.

Usage

mkCrossFrameMExperiment(
  dframe,
  varlist,
  outcomename,
  ...,
  weights = c(),
  minFraction = 0.02,
  smFactor = 0,
  rareCount = 0,
  rareSig = 1,
  collarProb = 0,
  codeRestriction = NULL,
  customCoders = NULL,
  scale = FALSE,
  doCollar = FALSE,
  splitFunction = vtreat::kWayCrossValidation,
  ncross = 3,
  forceSplit = FALSE,
  catScaling = FALSE,
  y_dependent_treatments = c("catB"),
  verbose = FALSE,
  parallelCluster = NULL,
  use_parallel = TRUE,
  missingness_imputation = NULL,
  imputation_map = NULL
)

Arguments

dframe

data to learn from

varlist

character, vector of indpendent variable column names.

outcomename

character, name of outcome column.

...

not used, declared to forced named binding of later arguments

weights

optional training weights for each row

minFraction

optional minimum frequency a categorical level must have to be converted to an indicator column.

smFactor

optional smoothing factor for impact coding models.

rareCount

optional integer, allow levels with this count or below to be pooled into a shared rare-level. Defaults to 0 or off.

rareSig

optional numeric, suppress levels from pooling at this significance value greater. Defaults to NULL or off.

collarProb

what fraction of the data (pseudo-probability) to collar data at if doCollar is set during prepare.multinomial_plan.

codeRestriction

what types of variables to produce (character array of level codes, NULL means no restriction).

customCoders

map from code names to custom categorical variable encoding functions (please see https://github.com/WinVector/vtreat/blob/master/extras/CustomLevelCoders.md).

scale

optional if TRUE replace numeric variables with regression ("move to outcome-scale").

doCollar

optional if TRUE collar numeric variables by cutting off after a tail-probability specified by collarProb during treatment design.

splitFunction

(optional) see vtreat::buildEvalSets .

ncross

optional scalar>=2 number of cross-validation rounds to design.

forceSplit

logical, if TRUE force cross-validated significance calculations on all variables.

catScaling

optional, if TRUE use glm() linkspace, if FALSE use lm() for scaling.

y_dependent_treatments

character what treatment types to build per-outcome level.

verbose

if TRUE print progress.

parallelCluster

(optional) a cluster object created by package parallel or package snow.

use_parallel

logical, if TRUE use parallel methods.

missingness_imputation

function of signature f(values: numeric, weights: numeric), simple missing value imputer.

imputation_map

map from column names to functions of signature f(values: numeric, weights: numeric), simple missing value imputers.

Value

a names list containing cross_frame, treat_m, score_frame, and fit_obj_id

Examples

Run this code

# NOT RUN {

# numeric example
set.seed(23525)

# we set up our raw training and application data
dTrainM <- data.frame(
  x = c('a', 'a', 'a', 'a', 'b', 'b', NA, NA),
  z = c(1, 2, 3, 4, 5, NA, 7, NA), 
  y = c(0, 0, 0, 1, 0, 1, 2, 1))

dTestM <- data.frame(
  x = c('a', 'b', 'c', NA), 
  z = c(10, 20, 30, NA))

# we perform a vtreat cross frame experiment
# and unpack the results into treatmentsM,
# dTrainMTreated, and score_frame
unpack[
  treatmentsM = treat_m,
  dTrainMTreated = cross_frame,
  score_frame = score_frame
  ] <- mkCrossFrameMExperiment(
    dframe = dTrainM,
    varlist = setdiff(colnames(dTrainM), 'y'),
    outcomename = 'y',
    verbose = FALSE)

# the score_frame relates new
# derived variables to original columns
score_frame[, c('origName', 'varName', 'code', 'rsq', 'sig', 'outcome_level')] %.>%
  print(.)

# the treated frame is a "cross frame" which
# is a transform of the training data built 
# as if the treatment were learned on a different
# disjoint training set to avoid nested model
# bias and over-fit.
dTrainMTreated %.>%
  head(.) %.>%
  print(.)

# Any future application data is prepared with
# the prepare method.
dTestMTreated <- prepare(treatmentsM, dTestM, pruneSig=NULL)

dTestMTreated %.>%
  head(.) %.>%
  print(.)

# }

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples