Learn R Programming

lgspline (version 0.2.0)

create_onehot: Create One-Hot Encoded Matrix

Description

Converts a categorical vector into a one-hot encoded matrix where each unique value becomes a binary column.

Usage

create_onehot(x)

Value

A data frame containing the one-hot encoded binary columns with cleaned column names

Arguments

x

A vector containing categorical values (factors, character, etc.)

Details

The function creates dummy variables for each unique value in the input vector using model.matrix() with dummy-intercept coding. Column names are cleaned by removing the 'x' prefix added by model.matrix().

Examples

Run this code

## lgspline will not accept this format of "catvar", because inputting data
# this way can cause difficult-to-diagnose issues in formula parsing
# all variables must be numeric
df <- data.frame(numvar = rnorm(100),
                 catvar = rep(LETTERS[1:4],
                              25))
print(head(df))

## Instead, replace with dummy-intercept coding by
# 1) applying one-hot encoding
# 2) dropping the first column
# 3) appending to our data

dummy_intercept_coding <- create_onehot(df$catvar)[,-1]
df$catvar <- NULL
df <- cbind(df, dummy_intercept_coding)
print(head(df))


Run the code above in your browser using DataLab