
Last chance! 50% off unlimited learning
Sale ends in
Build a list of one hot encoding for each cols
.
build_encoding(dataSet, cols = "auto", verbose = TRUE, min_frequency = 0, ...)
Matrix, data.frame or data.table
List of numeric column(s) name(s) of dataSet to transform. To transform all characters, set it to "auto". (character, default to "auto")
Should the algorithm talk? (Logical, default to TRUE)
The minimal share of lines that a category should represent (numeric, between 0 and 1, default to 0)
Other arguments such as name_separator
to separate words in new columns names
(character, default to ".")
A list where each element name is a column name of data set and each element new_cols and values the new columns that will be built during encoding.
To avoid creating really large sparce matrices, one can use param min_frequency
to be
sure that only most representative values will be used to create a new column (and not
outlayers or mistakes in data).
Setting min_frequency
to something gretter than 0 may cause the function to be slower
(especially for large dataSet).
# NOT RUN {
# Get a data set
data(adult)
encoding <- build_encoding(adult, cols = "auto", verbose = TRUE)
print(encoding)
# To limit the number of generated columns, one can use min_frequency parameter:
build_encoding(adult, cols = "auto", verbose = TRUE, min_frequency = 0.1)
# Set to 0.1, it will create columns only for values that are present 10% of the time.
# }
Run the code above in your browser using DataLab