Learn R Programming

CASMI (version 2.0.0)

autoBin.binary: Automatically Dichotomize Quantitative Variables

Description

Automatically compute optimal cutting points (based on mutual information) to dichotomize quantitative variables. This function can be used as a pre-processing step before using the CASMI-based functions.

Usage

autoBin.binary(data, index)

Value

`autoBin.binary()` returns the entire data frame after automatically dichotomizing the selected quantitative variable(s).

Arguments

data

data frame with variables as columns and observations as rows. The outcome variable (Y) MUST be categorical or discrete. The outcome variable (Y) MUST be the last column.

index

index or a vector of indices of the quantitative features (a.k.a., predictors, factors, independent variables) that need to be automatically categorized.

Examples

Run this code
## Using the "iris" dataset embedded in R
data("iris")
head(iris) # The original data

# ---- Dichotomize One Single Feature ----
# Dichotomize the column with index 1.
newData1 <- autoBin.binary(iris, 1)
head(newData1)

# ---- Dichotomize Multiple Features at a Time ----
# Dichotomize the columns with indices 1, 2, 3, and 4.
newData2 <- autoBin.binary(iris, c(1,2,3,4))
head(newData2)

# ---- Dichotomize Features Using Column Names ----
# Dichotomize the columns with the names "Sepal.Length" and "Sepal.Width".
cols_of_interest <- c("Sepal.Length", "Sepal.Width")
col_indices <- which(names(iris) %in% cols_of_interest)
newData3 <- autoBin.binary(iris, col_indices)
head(newData3)

Run the code above in your browser using DataLab