Learn R Programming

vtreat (version 0.5.16)

designTreatmentsC: designTreatmentsC

Description

Function to design variable treatments for binary prediction of a categorical outcome. Data frame is assumed to have only atomic columns except for dates (which are converted to numeric).

Usage

designTreatmentsC(dframe, varlist, outcomename, outcometarget, weights = c(),
  minFraction = 0.02, smFactor = 0, rareCount = 2, rareSig = 0.3,
  maxMissing = 0.04, collarProb = 0, returnXFrame = FALSE,
  scale = FALSE, doCollar = TRUE, verbose = TRUE,
  parallelCluster = NULL)

Arguments

dframe
Data frame to learn treatments from (training data).
varlist
Names of columns to treat (effective variables).
outcomename
Name of column holding outcome variable.
outcometarget
Value/level of outcome to be considered "success"
weights
optional training weights for each row
minFraction
optional minimum frequency a categorical level must have to be converted to an indicator column.
smFactor
optional smoothing factor for impact coding models.
rareCount
optional integer, supress direct effects of level of this count or less.
rareSig
optional integer, supress direct effects of level of this signficance or less.
maxMissing
optional maximum fraction (by data weight) of a categorical variable that are allowed before switching from indicators to impact coding.
collarProb
what fraction of the data (pseudo-probability) to collar data at (
returnXFrame
optional if TRUE return out of sample transformed frame.
scale
logical optional controls scaling for scoring and returnXFrame
doCollar
logical optional controls collaring for scoring and returnXFrame
verbose
if TRUE print progress.
parallelCluster
(optional) a cluster object created by package parallel or package snow

Value

  • treatment plan (for use with prepare)

Details

The main fields are mostly vectors with names (all with the same names in the same order):

- vars : (character array without names) names of variables (in same order as names on the other diagnostic vectors) - varMoves : logical TRUE if the variable varied during hold out scoring, only variables that move will be in the treated frame - #' - sig : an estimate signficance of effect

See the vtreat vignette for a bit more detail and a worked example.

See Also

prepare designTreatmentsN

Examples

Run this code
dTrainC <- data.frame(x=c('a','a','a','b','b','b'),
   z=c(1,2,3,4,5,6),
   y=c(FALSE,FALSE,TRUE,FALSE,TRUE,TRUE))
dTestC <- data.frame(x=c('a','b','c',NA),
   z=c(10,20,30,NA))
treatmentsC <- designTreatmentsC(dTrainC,colnames(dTrainC),'y',TRUE)
dTrainCTreated <- prepare(treatmentsC,dTrainC,pruneSig=0.99)
dTestCTreated <- prepare(treatmentsC,dTestC,pruneSig=0.99)

Run the code above in your browser using DataLab