Learn R Programming

vtreat (version 0.5.16)

prepare: prepare

Description

Use a treatment plan to prepare a data frame for analysis. The resulting frame will have new effective variables that are numeric and free of NaN/NA. If the outcome column is present it will be copied over. The intent is that these frames are compatible with more machine learning techniques, and avoid a lot of corner cases (NA,NaN, novel levels, too many levels). Note: each column is processed independently of all others.

Usage

prepare(treatmentplan, dframe, pruneSig, scale = FALSE, doCollar = TRUE,
  varRestriction = c(), parallelCluster = NULL)

Arguments

treatmentplan
Plan built by designTreantmentsC() or designTreatmentsN()
dframe
Data frame to be treated
pruneSig
suppress variables with significance above this level
scale
optional if TRUE replace numeric variables with regression ("move to outcome-scale").
doCollar
optional if TRUE collar numeric variables by cutting off after a tail-probability specified by collarProb during treatment design.
varRestriction
optional list of treated variable names to restrict to
parallelCluster
(optional) a cluster object created by package parallel or package snow

Value

  • treated data frame (all columns numeric, without NA,NaN)

See Also

designTreatmentsC designTreatmentsN

Examples

Run this code
dTrainN <- data.frame(x=c('a','a','a','a','b','b','b'),
    z=c(1,2,3,4,5,6,7),y=c(0,0,0,1,0,1,1))
dTestN <- data.frame(x=c('a','b','c',NA),z=c(10,20,30,NA))
treatmentsN = designTreatmentsN(dTrainN,colnames(dTrainN),'y')
dTrainNTreated <- prepare(treatmentsN,dTrainN,1.0)
dTestNTreated <- prepare(treatmentsN,dTestN,1.0)

dTrainC <- data.frame(x=c('a','a','a','b','b','b'),
    z=c(1,2,3,4,5,6),y=c(FALSE,FALSE,TRUE,FALSE,TRUE,TRUE))
dTestC <- data.frame(x=c('a','b','c',NA),z=c(10,20,30,NA))
treatmentsC <- designTreatmentsC(dTrainC,colnames(dTrainC),'y',TRUE)
dTrainCTreated <- prepare(treatmentsC,dTrainC,1.0)
dTestCTreated <- prepare(treatmentsC,dTestC,1.0)

Run the code above in your browser using DataLab