Learn R Programming

alookr (version 0.5.0)

cleanse.split_df: Cleansing the dataset for classification modeling

Description

Diagnosis of similarity between datasets splitted by train set and set included in the "split_df" class. and cleansing the "split_df" class

Usage

# S3 method for split_df
cleanse(.data, add_character = FALSE, uniq_thres = 0.9, missing = FALSE, ...)

Value

An object of class "split_df".

Arguments

.data

an object of class "split_df", usually, a result of a call to split_df().

add_character

logical. Decide whether to include text variables in the compare of categorical data. The default value is FALSE, which also not includes character variables.

uniq_thres

numeric. Set a threshold to removing variables when the ratio of unique values(number of unique values / number of observation) is greater than the set value.

missing

logical. Set whether to removing variables including missing value

...

further arguments passed to or from other methods.

Details

Remove the detected variables from the diagnosis using the compare_diag() function.

Examples

Run this code
library(dplyr)

# Credit Card Default Data
head(ISLR::Default)

# Generate data for the example
sb <- ISLR::Default %>%
  split_by(default)

sb %>%
  cleanse

Run the code above in your browser using DataLab