Learn R Programming

RemixAutoML (version 0.4.2)

ContinuousTimeDataGenerator: ContinuousTimeDataGenerator for creating continuous time data sets for on demand modeling

Description

ContinuousTimeDataGenerator for creating continuous time data sets for on demand modeling of transactional panel data.

Usage

ContinuousTimeDataGenerator(
  data,
  RestrictDateRange = TRUE,
  Case = 2L,
  FC_Periods = 52L,
  SaveData = FALSE,
  FilePath = NULL,
  TargetVariableName = "qty",
  DateVariableName = "date",
  GDL_Targets = NULL,
  TimeUnit = "raw",
  TimeGroups = c("raw", "day", "week"),
  GroupingVariables = "sku",
  HierarchyGroupVars = NULL,
  MinTimeWindow = 1L,
  MinTxnRecords = 2L,
  Lags = 1L:7L,
  MA_Periods = 10L,
  SD_Periods = 10L,
  Skew_Periods = 10L,
  Kurt_Periods = 10L,
  Quantile_Periods = 10L,
  Quantiles_Selected = c("q5"),
  HolidayLags = c(1L:7L),
  HolidayMovingAverages = c(2L:14L),
  TimeBetween = NULL,
  TimeTrendVariable = TRUE,
  CalendarVariables = c("wday", "mday", "yday", "week", "isoweek", "month", "quarter",
    "year"),
  HolidayGroups = "USPublicHolidays",
  PowerRate = 0.5,
  SampleRate = 5,
  TargetWindowSamples = 5,
  PrintSteps = TRUE
)

Arguments

data

This is your transactional level data

RestrictDateRange

Set to TRUE to only pull samples by entity within the entity life (not beyond)

Case

Currently set as 1 for forecasting and 2 for other

FC_Periods

The number of future periods to collect data on

SaveData

Set to TRUE to save the MetaData and final modeling data sets to file

FilePath

Set to your file of choice for where you want the data sets saved

TargetVariableName

The name of your target variable that represents demand

DateVariableName

The date variable of the demand instances

GDL_Targets

The variable names to run through AutoLagRollStats()

TimeUnit

List the time unit your data is aggregated by. E.g. "day", "week", "month", "quarter", "year"

TimeGroups

= c("raw","day","week"),

GroupingVariables

These variables (or sinlge variable) is the combination of categorical variables that uniquely defines the level of granularity of each individual level to forecast. E.g. "sku" or c("Store","Department"). Sku is typically unique for all sku's. Store and Department in combination defines all unique departments as the department may be repeated across the stores.

HierarchyGroupVars

Group vars

MinTimeWindow

The number of time periods you would like to omit for training. Default is 1 so that at a minimum, there is at least one period of values to forecast. You can set it up to a larger value if you do not want more possible target windows for the lower target window values.

MinTxnRecords

I typically set this to 2 so that there is at least one other instance of demand so that the forecasted values are not complete nonsense.

Lags

Select the periods for all lag variables you want to create. E.g. c(1:5,52)

MA_Periods

Select the periods for all moving average variables you want to create. E.g. c(1:5,52)

SD_Periods

Select the periods for all sd variables you want to create. E.g. c(1:5,52)

Skew_Periods

Select the periods for all skew variables you want to create. E.g. c(1:5,52)

Kurt_Periods

Select the periods for all kurtosis variables you want to create. E.g. c(1:5,52)

Quantile_Periods

Select the periods for all quantiles variables you want to create. E.g. c(1:5,52)

Quantiles_Selected

Select the quantiles you want. q5, q10, ..., q95

HolidayLags

Select the lags you want generated

HolidayMovingAverages

Select the moving averages you want generated

TimeBetween

Supply a name or NULL

TimeTrendVariable

Set to TRUE to have a time trend variable added to the model. Time trend is numeric variable indicating the numeric value of each record in the time series (by group). Time trend starts at 1 for the earliest point in time and increments by one for each success time point.

CalendarVariables

Set to TRUE to have calendar variables created. The calendar variables are numeric representations of second, minute, hour, week day, month day, year day, week, isoweek, quarter, and year

HolidayGroups

Input the holiday groups of your choice from the CreateHolidayVariable() function in this package

PowerRate

Sampling parameter

SampleRate

Set this to a value greater than 0. The calculation used is the number of records per group level raised to the power of PowerRate. Then that values is multiplied by SampleRate.

TargetWindowSamples

= 5

PrintSteps

Set to TRUE to have operation steps printed to the console

Value

Returns two data.table data sets: The first is a modeling data set for the count distribution while the second data set if for the size model data set.

See Also

Other Feature Engineering: AutoDataPartition(), AutoHierarchicalFourier(), AutoInteraction(), AutoLagRollStatsScoring(), AutoLagRollStats(), AutoTransformationCreate(), AutoTransformationScore(), AutoWord2VecModeler(), AutoWord2VecScoring(), CreateCalendarVariables(), CreateHolidayVariables(), DT_GDL_Feature_Engineering(), DifferenceDataReverse(), DifferenceData(), DummifyDT(), H2oAutoencoder(), ModelDataPrep(), Partial_DT_GDL_Feature_Engineering(), TimeSeriesFill()

Examples

Run this code
# NOT RUN {
DataSets <- ContinuousTimeDataGenerator(
  data,
  RestrictDateRange = TRUE,
  FC_Periods = 52,
  SaveData = FALSE,
  FilePath = normalizePath("./"),
  TargetVariableName = "qty",
  DateVariableName = "date",
  GDL_Targets = NULL,
  GroupingVariables = "sku",
  HierarchyGroupVars = NULL,
  TimeGroups = c("raw","day","week"),
  MinTimeWindow = 1,
  MinTxnRecords = 2,
  Lags = 1:7,
  MA_Periods = 10L,
  SD_Periods = 10L,
  Skew_Periods = 10L,
  Kurt_Periods = 10L,
  Quantile_Periods = 10L,
  Quantiles_Selected = c("q5"),
  HolidayLags = c(1L:7L),
  HolidayMovingAverages = c(2L:14L),
  TimeBetween = NULL,
  TimeTrendVariable = TRUE,
  TimeUnit = "day",
  CalendarVariables = c("wday",
    "mday",
    "yday",
    "week",
    "isoweek",
    "month",
    "quarter",
    "year"),
  HolidayGroups = "USPublicHolidays",
  PowerRate = 0.5,
  SampleRate = 5,
  TargetWindowSamples = 5,
  PrintSteps = TRUE)
CountModelData <- DataSets$CountModelData
SizeModelData <- DataSets$SizeModelData
rm(DataSets)
# }

Run the code above in your browser using DataLab