datarobot (version 2.4.0)

CreateStratifiedPartition: Create a stratified sampling-based S3 object of class partition for the SetTarget function

Description

Stratified partitioning is supported for binary classification problems and it randomly partitions the modeling data, keeping the percentage of positive class observations in each partition the same as in the original dataset. Stratified partitioning is supported for either Training/Validation/Holdout ('TVH') or cross-validation ('CV') splits. In either case, the holdout percentage (holdoutPct) must be specified; for the 'CV' method, the number of cross-validation folds (reps) must also be specified, while for the 'TVH' method, the validation subset percentage (validationPct) must be specified.

Usage

CreateStratifiedPartition(validationType, holdoutPct, reps = NULL, validationPct = NULL)

Arguments

validationType
Character string specifying the type of partition generated, either 'TVH' or 'CV'.
holdoutPct
Integer, giving the percentage of data to be used as the holdout subset.
reps
Integer, specifying the number of cross-validation folds to generate; only applicable when validationType = 'CV'.
validationPct
Integer, giving the percentage of data to be used as the validation subset.

Value

An S3 object of class 'partition' including the parameters required by the SetTarget function to generate a stratified partitioning of the modeling dataset.

Details

This function is one of several convenience functions provided to simplify the task of starting modeling projects with custom partitioning options. The other functions are CreateGroupPartition, CreateRandomPartition, and CreateUserPartition.