mkCrossFrameCExperiment

Data frame to learn treatments from (training data), must have at least 1 row.

dframe

Names of columns to treat (effective variables).

varlist

Name of column holding outcome variable. dframe[[outcomename]] must be only finite non-missing values.

outcomename

Value/level of outcome to be considered "success", and there must be a cut such that dframe[[outcomename]]==outcometarget at least twice and dframe[[outcomename]]!=outcometarget at least twice.

outcometarget

no additional arguments, declared to forced named binding of later arguments

optional training weights for each row

weights

optional minimum frequency a categorical level must have to be converted to an indicator column.

minFraction

optional smoothing factor for impact coding models.

smFactor

optional integer, allow levels with this count or below to be pooled into a shared rare-level. Defaults to 0 or off.

rareCount

optional numeric, suppress levels from pooling at this significance value greater. Defaults to NULL or off.

rareSig

what fraction of the data (pseudo-probability) to collar data at if doCollar is set during <code><a rd-options="" href="/link/prepare.treatmentplan?package=vtreat&version=1.4.5" data-mini-rdoc="vtreat::prepare.treatmentplan">prepare.treatmentplan</a></code>.

collarProb

what types of variables to produce (character array of level codes, NULL means no restriction).

codeRestriction

map from code names to custom categorical variable encoding functions (please see <a href="https://github.com/WinVector/vtreat/blob/master/extras/CustomLevelCoders.md">https://github.com/WinVector/vtreat/blob/master/extras/CustomLevelCoders.md</a>).

customCoders

optional if TRUE replace numeric variables with regression ("move to outcome-scale").

scale

optional if TRUE collar numeric variables by cutting off after a tail-probability specified by collarProb during treatment design.

doCollar

(optional) see vtreat::buildEvalSets .

splitFunction

optional scalar&gt;=2 number of cross-validation rounds to design.

ncross

logical, if TRUE force cross-validated significance calculations on all variables.

forceSplit

optional, if TRUE use glm() linkspace, if FALSE use lm() for scaling.

catScaling

verbose

(optional) a cluster object created by package parallel or package snow.

parallelCluster

logical, if TRUE use parallel methods.

use_parallel

Builds a <code><a rd-options="" href="/link/designTreatmentsC?package=vtreat&version=1.4.5" data-mini-rdoc="vtreat::designTreatmentsC">designTreatmentsC</a></code> treatment plan and a data frame prepared 
from <code>dframe</code> that is "cross" in the sense each row is treated using a treatment
plan built from a subset of dframe disjoint from the given row.
The goal is to try to and supply a method of breaking nested model bias other than splitting
into calibration, training, test sets.

A 'data.frame' processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner.
'vtreat' prepares variables so that data has fewer exceptional cases, making
it easier to safely use models in production. Common problems 'vtreat' defends
against: 'Inf', 'NA', too many categorical levels, rare categorical levels, and new
categorical levels (levels seen during application, but not during training). Reference:
"'vtreat': a data.frame Processor for Predictive Modeling", Zumel, Mount, 2016, <DOI:10.5281/zenodo.1173313>.

John Mount

vtreat

A Statistically Sound 'data.frame' Processor/Conditioner

Nina Zumel

 Win-Vector LLC

mkCrossFrameCExperiment function

what fraction of the data (pseudo-probability) to collar data at if doCollar is set during <code><a rd-options='' href='prepare.treatmentplan'>prepare.treatmentplan</a></code>.

map from code names to custom categorical variable encoding functions (please see <a href='https://github.com/WinVector/vtreat/blob/master/extras/CustomLevelCoders.md'>https://github.com/WinVector/vtreat/blob/master/extras/CustomLevelCoders.md</a>).

Builds a <code><a rd-options='' href='designTreatmentsC'>designTreatmentsC</a></code> treatment plan and a data frame prepared 
from <code>dframe</code> that is "cross" in the sense each row is treated using a treatment
plan built from a subset of dframe disjoint from the given row.
The goal is to try to and supply a method of breaking nested model bias other than splitting
into calibration, training, test sets.

mkCrossFrameCExperiment: Run categorical cross-frame experiment.

Description

Usage

Arguments

Value

See Also

Examples