Learn R Programming

lares (version 4.8.4)

balance_data: Balance Binary Data by Resampling: Under-Over Sampling

Description

This function lets the user balance a given data.frame by resampling with a given relation rate and a binary feature.

Usage

balance_data(df, variable, rate = 1, seed = 0)

Arguments

df

Vector or Dataframe. Contains different variables in each column, separated by a specific character

variable

Character. Which binary variable should we use to resample df

rate

Numeric. How many X for every Y we need? Default: 1. If there are more than 2 unique values, rate will represent percentage for number of rows

seed

Numeric. Seed to replicate and obtain same values

See Also

Other Data Wrangling: categ_reducer(), cleanText(), date_cuts(), date_feats(), dateformat(), formatNum(), formatTime(), holidays(), impute(), left(), normalize(), numericalonly(), ohe_commas(), ohse(), rbind_full(), removenacols(), removenarows(), replaceall(), right(), textFeats(), textTokenizer(), vector2text(), year_month(), year_week()

Examples

Run this code
# NOT RUN {
data(dft) # Titanic dataset
df <- balance_data(dft, "Survived", rate = 1, seed = 123)
freqs(df, Survived)
# }

Run the code above in your browser using DataLab