Last chance! 50% off unlimited learning
Sale ends in
Builds autoregressive and moving average from target columns and distributed lags and distributed moving average for independent features distributed across time. On top of that, you can also create time between instances along with their associated lags and moving averages. This function works for data with groups and without groups.
DT_GDL_Feature_Engineering(
data,
lags = c(seq(1, 50, 1)),
periods = c(seq(5, 95, 5)),
SDperiods = c(seq(5, 95, 5)),
Skewperiods = c(seq(5, 95, 5)),
Kurtperiods = c(seq(5, 95, 5)),
Quantileperiods = c(seq(5, 95, 5)),
statsFUNs = c("mean"),
targets = NULL,
groupingVars = NULL,
sortDateName = NULL,
timeDiffTarget = NULL,
timeAgg = c("days"),
WindowingLag = 0,
Type = c("Lag"),
SimpleImpute = TRUE
)
A data.table you want to run the function on
A numeric vector of the specific lags you want to have generated. You must include 1 if WindowingLag = 1.
A numeric vector of the specific rolling statistics window sizes you want to utilize in the calculations.
A numeric vector of Standard Deviation rolling statistics window sizes you want to utilize in the calculations.
A numeric vector of Skewness rolling statistics window sizes you want to utilize in the calculations.
A numeric vector of Kurtosis rolling statistics window sizes you want to utilize in the calculations.
A numeric vector of Quantile rolling statistics window sizes you want to utilize in the calculations.
Select from the following c("mean","sd","skew","kurt","q5","q10","q15","q20","q25","q30","q35","q40","q45","q50","q55","q60","q65","q70","q75","q80","q85","q90","q95")
A character vector of the column names for the reference column in which you will build your lags and rolling stats
A character vector of categorical variable names you will build your lags and rolling stats by
The column name of your date column used to sort events over time
Specify a desired name for features created for time between events. Set to NULL if you don't want time between events features created.
List the time aggregation level for the time between events features, such as "hour", "day", "week", "month", "quarter", or "year"
Set to 0 to build rolling stats off of target columns directly or set to 1 to build the rolling stats off of the lag-1 target
List either "Lag" if you want features built on historical values or "Lead" if you want features built on future values
Set to TRUE for factor level imputation of "0" and numeric imputation of -1
data.table of original data plus created lags, rolling stats, and time between event lags and rolling stats
Other Feature Engineering:
AutoDataPartition()
,
AutoDiffLagN()
,
AutoHierarchicalFourier()
,
AutoInteraction()
,
AutoLagRollStatsScoring()
,
AutoLagRollStats()
,
AutoTransformationCreate()
,
AutoTransformationScore()
,
AutoWord2VecModeler()
,
AutoWord2VecScoring()
,
ContinuousTimeDataGenerator()
,
CreateCalendarVariables()
,
CreateHolidayVariables()
,
DifferenceDataReverse()
,
DifferenceData()
,
DummifyDT()
,
H2OAutoencoderScoring()
,
H2OAutoencoder()
,
ModelDataPrep()
,
Partial_DT_GDL_Feature_Engineering()
,
TimeSeriesFill()
# NOT RUN {
N = 25116
data <- data.table::data.table(
DateTime = as.Date(Sys.time()),
Target = stats::filter(rnorm(N, mean = 50, sd = 20),
filter=rep(1,10),
circular=TRUE))
data[, temp := seq(1:N)][, DateTime := DateTime - temp][
, temp := NULL]
data <- data[order(DateTime)]
data <- DT_GDL_Feature_Engineering(
data,
lags = c(seq(1,5,1)),
periods = c(3,5,10,15,20,25),
SDperiods = c(seq(5, 95, 5)),
Skewperiods = c(seq(5, 95, 5)),
Kurtperiods = c(seq(5, 95, 5)),
Quantileperiods = c(seq(5, 95, 5)),
statsFUNs = c("mean",
"sd","skew","kurt","q05","q95"),
targets = c("Target"),
groupingVars = NULL,
sortDateName = "DateTime",
timeDiffTarget = c("Time_Gap"),
timeAgg = c("days"),
WindowingLag = 1,
Type = "Lag",
SimpleImpute = TRUE)
# }
Run the code above in your browser using DataLab