prepare

Plan built by designTreantmentsC() or designTreatmentsN()

treatmentplan

dframe

no additional arguments, declared to forced named binding of later arguments

suppress variables with significance above this level

pruneSig

optional if TRUE replace numeric variables with single variable model regressions ("move to outcome-scale"). These have mean zero and (for varaibles with signficant less than 1) slope 1 when regressed (lm for regression problems/glm for classificaiton problems) against outcome.

scale

optional if TRUE collar numeric variables by cutting off after a tail-probability specified by collarProb during treatment design.

doCollar

optional list of treated variable names to restrict to

varRestriction

optional list of treated variable codes to restrict to

codeRestriction

(optional) a cluster object created by package parallel or package snow

parallelCluster

Use a treatment plan to prepare a data frame for analysis. The
resulting frame will have new effective variables that are numeric
and free of NaN/NA. If the outcome column is present it will be copied over.
The intent is that these frames are compatible with more machine learning
techniques, and avoid a lot of corner cases (NA,NaN, novel levels, too many levels).
Note: each column is processed independently of all others. Also copies over outcome if present.

A 'data.frame' processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner.
'vtreat' prepares variables so that data has fewer exceptional cases, making
it easier to safely use models in production. Common problems 'vtreat' defends
against: 'Inf', 'NA', too many categorical levels, rare categorical levels, and new
categorical levels (levels seen during application, but not during training).
'vtreat::prepare' should be used as you would use 'model.matrix'.

prepare: Apply treatments and restrict to useful variables.

Description

Usage

Arguments

Value

See Also

Examples