Fast creation of dummy variables

dummy_cols() quickly creates dummy (binary) columns from character and factor type columns in the inputted data. This function is useful for statistical analysis when you want binary columns rather than character columns.

dummy_cols(.data, select_columns = NULL, remove_first_dummy = FALSE,
  remove_most_frequent_dummy = FALSE)

An object with the data set you want to make dummy columns from.


Vector of column names that you want to create dummy variables from. If NULL (default), uses all character and factor columns.


Removes the first dummy of every variable such that only n-1 dummies remain. This avoids multicollinearity issues in models.


Removes the most frequently observed category such that only n-1 dummies remain. If there is a tie for most frequent, will remove the first (by alphabetical order) category that is tied for most frequent.


A data.frame (or tibble or data.table, depending on input data type) with same number of rows as inputted data and original columns plus the newly created dummy columns.

See Also

dummy_rows For creating dummy rows

Other dummy functions: dummy_columns, dummy_rows

  • dummy_cols
crime <- data.frame(city = c("SF", "SF", "NYC"),
    year = c(1990, 2000, 1990),
    crime = 1:3)
# Include year column
dummy_cols(crime, select_columns = c("city", "year"))
# Remove first dummy for each pair of dummy columns made
dummy_cols(crime, select_columns = c("city", "year"),
    remove_first_dummy = TRUE)
# }
Documentation reproduced from package fastDummies, version 1.2.0, License: GPL

Community examples

akaEmma@gmail.com at Sep 1, 2018 fastDummies v0.1.2

##Using Centers for Disease Control and Prevention. National Immunization Surveys, 2016. Public-use data file and documentation. ##https://www.cdc.gov/vaccines/imz-managers/nis/datasets.html. August 2018. ##It has a LOT of categorical variables. ``` vaccine_data <- vaccine_data %>% select(-c(seqnumc, seqnumhh)) # Take out IDs for correlations head(vaccine_data) vaccine_data <- vaccine_data %>% dummy_cols() names(vaccine_data) # lots more variables ! and they are beautifully binary for the correlations I want to do. ```