Learn R Programming

RemixAutoML (version 0.11.0)

DummifyDT: DummifyDT creates dummy variables for the selected columns.

Description

DummifyDT creates dummy variables for the selected columns. Either one-hot encoding, N+1 columns for N levels, or N columns for N levels.

Usage

DummifyDT(data, cols, KeepFactorCols = FALSE, OneHot = FALSE,
  SaveFactorLevels = FALSE, SavePath = NULL,
  ImportFactorLevels = FALSE, FactorLevelsList = NULL,
  ClustScore = FALSE, ReturnFactorLevels = FALSE)

Arguments

data

The data set to run the micro auc on

cols

A vector with the names of the columns you wish to dichotomize

KeepFactorCols

Set to TRUE to keep the original columns used in the dichotomization process

OneHot

Set to TRUE to run one hot encoding, FALSE to generate N columns for N levels

SaveFactorLevels

Set to TRUE to save unique levels of each factor column to file as a csv

SavePath

Provide a file path to save your factor levels. Use this for models that you have to create dummy variables for.

ImportFactorLevels

Instead of using the data you provide, import the factor levels csv to ensure you build out all of the columns you trained with in modeling.

FactorLevelsList

Supply a list of factor variable levels

ClustScore

This is for scoring AutoKMeans. Set to FALSE for all other applications.

ReturnFactorLevels

If you want a named list of all the factor levels returned, set this to TRUE. Doing so will cause the function to return a list with the source data.table and the list of factor variables' levels

Value

Either a data table with new dummy variables columns and optionally removes base columns (if ReturnFactorLevels is FALSE), otherwise a list with the data.table and a list of the factor levels.

See Also

Other Feature Engineering: AutoDataPartition, AutoTransformationCreate, AutoTransformationScore, AutoWord2VecModeler, CreateCalendarVariables, CreateHolidayVariables, DT_GDL_Feature_Engineering, GDL_Feature_Engineering, ModelDataPrep, Partial_DT_GDL_Feature_Engineering, Scoring_GDL_Feature_Engineering, TimeSeriesFill

Examples

Run this code
# NOT RUN {
test <- data.table::data.table(Value = runif(100000),
                   FactorCol = sample(x = c(letters,
                                            LETTERS,
                                            paste0(letters,letters),
                                            paste0(LETTERS,LETTERS),
                                            paste0(letters,LETTERS),
                                            paste0(LETTERS,letters)),
                                      size = 100000,
                                      replace = TRUE))
test <- DummifyDT(data = test,
                  cols = "FactorCol",
                  KeepFactorCols = FALSE,
                  OneHot = FALSE,
                  SaveFactorLevels = FALSE,
                  SavePath = NULL,
                  ImportFactorLevels = FALSE,
                  FactorLevelsList = NULL,
                  ClustScore = FALSE,
                  ReturnFactorLevels = FALSE)
ncol(test)
test[, sum(FactorCol_gg)]
# }

Run the code above in your browser using DataLab