Learn R Programming

superml (version 0.1.0)

kFoldMean: kFoldMean Calculator

Description

Calculates out-of-fold mean features (also known as target encoding) for train and test data. Make sure to rbind both train and test into one data frame. This strategy is widely used to avoid overfitting or causing leakage while creating features using the target variable.

Usage

kFoldMean(train_df, test_df, colname, target, n_fold = 5, seed = 42)

Arguments

train_df

train dataset

test_df

test dataset

colname

name of categorical column

target

the target or dependent variable, should be a string.

n_fold

the number of folds to use for doing kfold computation, default=5

seed

the seed value, to ensure reproducibility, it could be any positive value, default=42

Value

a train and test data table with out-of-fold mean value of the target for the given categorical variable

Examples

Run this code
# NOT RUN {
train <- data.frame(region=c('del','csk','rcb','del','csk','pune','guj','del'),
                    win = c(0,1,1,0,0,0,0,1))
test <- data.frame(region=c('rcb','csk','rcb','del','guj','pune','csk','kol'))
train_result <- kFoldMean(train_df = train,
                          test_df = test,
                          colname = 'region',
                          target = 'win',
                          seed = 1220)$train

test_result <- kFoldMean(train_df = train,
                         test_df = test,
                         colname = 'region',
                         target = 'win',
                         seed = 1220)$test
# }

Run the code above in your browser using DataLab