Learn R Programming

messy.cats

The goal of messy.cats is to make cleaning messy categorical easier. User inputted character categorical data often suffers from messiness than can be complicated and time consuming to clean up. When inputting data, users often make typos or formatting errors than cause stubborn, hard to detect issues in their data. By leveraging string distance measurement tools, messy.cats allows users to automate many of the steps involved with cleaning categorical data. This enables users to spend less time fiddling around with inconsistent categorical data with less effort.

Installation

You can install the released version of messy.cats from GITHUB with:

if(!require(devtools)){
 install.packages("devtools")
}
devtools::install_github("hkarp1/messy.cats")

Example

This is a basic example which shows you how to solve a common problem:

library(messy.cats)
plant_categories = c("tree", "bush", "herb", "grass")
messy_plant_categories = c("green tree", "red bush", "new herb", "old grass", "young tree", "small bush", "20 herbs", "the grass", "a tree", 
"bushes", "herbs", "tall grass")

cat_match(plant_categories, messy_plant_categories)

Copy Link

Version

Install

install.packages('messy.cats')

Monthly Downloads

173

Version

1.0

License

MIT + file LICENSE

Maintainer

Harrison Karp

Last Published

November 30th, 2022

Functions in messy.cats (1.0)

cat_match

cat_match
cat_join

cat_join
typos

typos
country.names

country.names
clean_names.df

clean_names.df
select_metric

select_metric
country_match

country_match
state.name

state.name
messy.cats-package

messy.cats: Employs String Distance Tools to Help Clean Categorical Data
country_replace

country_replace
state_match

state_match
fix_typos

fix_typos
state_replace

state_replace
messy_caterpillars

messy_caterpillars
fuzzy_rbind

fuzzy_rbind
messy_states1

messy_states1
messy_names.df

messy_names.df
picked_list

picked_list
messy_states2

messy_states2
cat_replace

cat_replace
clean_caterpillars

clean_caterpillars