
This function will take your ratings matrix and model and score your data in parallel.
AutoDataPartition(data, NumDataSets = 3, Ratios = c(0.7, 0.2, 0.1),
PartitionType = "random", StratifyColumnNames = NULL,
StratifyNumericTarget = NULL, StratTargetPrecision = 3,
TimeColumnName = NULL)
Source data to do your partitioning on
The number of total data sets you want built
A vector of values for how much data each data set should get in each split. E.g. c(0.70, 0.20, 0.10)
Set to either "random", "timeseries", or "time". With "random", your data will be paritioned randomly (with stratified sampling if column names are supplied). With "timeseries", you can partition by time with a stratify option (so long as you have an equal number of records for each strata). With "time" you will have data sets generated so that the training data contains the earliest records in time, validation data the second earliest, test data the third earliest, etc.
Supply column names of categorical features to use in a stratified sampling procedure for partitioning the data. Partition type must be "random" to use this option
Supply a column name that is numeric. Use for "random" PartitionType, you can stratify your numeric variable by splitting up based on percRank to ensure a proper allocation of extreme values in your created data sets.
For "random" PartitionType and when StratifyNumericTarget is not null, precision will be the number of decimals used in the percentile calculation. If you supply a value of 1, deciles will be used. For a value of 2, percentiles will be used. Larger values are supported.
Supply a date column name or a name of a column with an ID for sorting by time such that the smallest number is the earliest in time.
Returns a list of data.tables
Other Feature Engineering: AutoTransformationCreate
,
AutoTransformationScore
,
AutoWord2VecModeler
,
CreateCalendarVariables
,
DT_GDL_Feature_Engineering
,
DummifyDT
,
GDL_Feature_Engineering
,
ModelDataPrep
,
Partial_DT_GDL_Feature_Engineering2
,
Partial_DT_GDL_Feature_Engineering
,
Scoring_GDL_Feature_Engineering
# NOT RUN {
dataSets <- AutoDataPartition(data,
NumDataSets = 3,
Ratios = c(0.70,0.20,0.10),
PartitionType = "random",
StratifyColumnNames = NULL,
StratifyNumericTarget = NULL,
StratTargetPrecision = 1,
TimeColumnName = NULL)
# }
Run the code above in your browser using DataLab