Learn R Programming

RemixAutoML (version 0.11.0)

AutoDataPartition: The AutoDataPartition function

Description

This function will take your ratings matrix and model and score your data in parallel.

Usage

AutoDataPartition(data, NumDataSets = 3, Ratios = c(0.7, 0.2, 0.1),
  PartitionType = "random", StratifyColumnNames = NULL,
  StratifyNumericTarget = NULL, StratTargetPrecision = 3,
  TimeColumnName = NULL)

Arguments

data

Source data to do your partitioning on

NumDataSets

The number of total data sets you want built

Ratios

A vector of values for how much data each data set should get in each split. E.g. c(0.70, 0.20, 0.10)

PartitionType

Set to either "random", "timeseries", or "time". With "random", your data will be paritioned randomly (with stratified sampling if column names are supplied). With "timeseries", you can partition by time with a stratify option (so long as you have an equal number of records for each strata). With "time" you will have data sets generated so that the training data contains the earliest records in time, validation data the second earliest, test data the third earliest, etc.

StratifyColumnNames

Supply column names of categorical features to use in a stratified sampling procedure for partitioning the data. Partition type must be "random" to use this option

StratifyNumericTarget

Supply a column name that is numeric. Use for "random" PartitionType, you can stratify your numeric variable by splitting up based on percRank to ensure a proper allocation of extreme values in your created data sets.

StratTargetPrecision

For "random" PartitionType and when StratifyNumericTarget is not null, precision will be the number of decimals used in the percentile calculation. If you supply a value of 1, deciles will be used. For a value of 2, percentiles will be used. Larger values are supported.

TimeColumnName

Supply a date column name or a name of a column with an ID for sorting by time such that the smallest number is the earliest in time.

Value

Returns a list of data.tables

See Also

Other Feature Engineering: AutoTransformationCreate, AutoTransformationScore, AutoWord2VecModeler, CreateCalendarVariables, CreateHolidayVariables, DT_GDL_Feature_Engineering, DummifyDT, GDL_Feature_Engineering, ModelDataPrep, Partial_DT_GDL_Feature_Engineering, Scoring_GDL_Feature_Engineering, TimeSeriesFill

Examples

Run this code
# NOT RUN {
dataSets <- AutoDataPartition(data,
                              NumDataSets = 3,
                              Ratios = c(0.70,0.20,0.10),
                              PartitionType = "random",
                              StratifyColumnNames = NULL,
                              StratifyNumericTarget = NULL,
                              StratTargetPrecision = 1,
                              TimeColumnName = NULL)
# }

Run the code above in your browser using DataLab