Learn R Programming

RemixAutoML (version 0.11.0)

Scoring_GDL_Feature_Engineering: An Automated Scoring Feature Engineering Function

Description

For scoring purposes of a single record when 1 or 0 groups were utilized in the DT_GDL_Feature_Engineering() function. This function runs internally inside the CARMA functions but might have use outside of it.

Usage

Scoring_GDL_Feature_Engineering(data, lags = c(seq(1, 5, 1)),
  periods = c(3, 5, 10, 15, 20, 25), statsNames = c("MA"),
  targets = c("Target"), groupingVars = NULL,
  sortDateName = c("DateTime"), timeDiffTarget = c("Time_Gap"),
  timeAgg = "days", WindowingLag = 1, Type = "Lag", Timer = TRUE,
  SimpleImpute = TRUE, AscRowByGroup = "temp", RecordsKeep = 1)

Arguments

data

A data.table you want to run the function on

lags

A numeric vector of the specific lags you want to have generated. You must include 1 if WindowingLag = 1.

periods

A numeric vector of the specific rolling statistics window sizes you want to utilize in the calculations.

statsNames

A character vector of the corresponding names to create for the rollings stats variables.

targets

A character vector of the column names for the reference column in which you will build your lags and rolling stats

groupingVars

A character vector of categorical variable names you will build your lags and rolling stats by

sortDateName

The column name of your date column used to sort events over time

timeDiffTarget

Specify a desired name for features created for time between events. Set to NULL if you don't want time between events features created.

timeAgg

List the time aggregation level for the time between events features, such as "hour", "day", "week", "month", "quarter", or "year"

WindowingLag

Set to 0 to build rolling stats off of target columns directly or set to 1 to build the rolling stats off of the lag-1 target

Type

List either "Lag" if you want features built on historical values or "Lead" if you want features built on future values

Timer

Set to TRUE if you percentage complete tracker printout

SimpleImpute

Set to TRUE for factor level imputation of "0" and numeric imputation of -1

AscRowByGroup

Required to have a column with a Row Number by group (if grouping) with 1 being the record for scoring (typically the most current in time)

RecordsKeep

Keep set to 1. Any larger value and the results will not be what you intend. I'll remove it eventually. If you want more than 1, look up Partial_DT_GDL_Feature_Engineering().

Value

data.table of original data plus created lags, rolling stats, and time between event lags and rolling stats

See Also

Other Feature Engineering: AutoDataPartition, AutoTransformationCreate, AutoTransformationScore, AutoWord2VecModeler, CreateCalendarVariables, CreateHolidayVariables, DT_GDL_Feature_Engineering, DummifyDT, GDL_Feature_Engineering, ModelDataPrep, Partial_DT_GDL_Feature_Engineering, TimeSeriesFill

Examples

Run this code
# NOT RUN {
N = 25116
data1 <- data.table::data.table(DateTime = as.Date(Sys.time()),
                                Target = stats::filter(rnorm(N,
                                                             mean = 50,
                                                             sd = 20),
                                                       filter=rep(1,10),
                                                       circular=TRUE))
data1[, temp := seq(1:N)][, DateTime := DateTime - temp]
data1 <- data1[order(DateTime)]
data1 <- Scoring_GDL_Feature_Engineering(data1,
                                         lags           = c(seq(1,5,1)),
                                         periods        = c(3,5,10,15,20,25),
                                         statsNames     = c("MA"),
                                         targets        = c("Target"),
                                         groupingVars   = NULL,
                                         sortDateName   = c("DateTime"),
                                         timeDiffTarget = c("Time_Gap"),
                                         timeAgg        = "days",
                                         WindowingLag   = 1,
                                         Type           = "Lag",
                                         Timer          = TRUE,
                                         SimpleImpute   = TRUE,
                                         AscRowByGroup  = "temp",
                                         RecordsKeep    = 1)
# }

Run the code above in your browser using DataLab