An automated cleaning function for spell-checking, de-pluralizing, removing duplicates, and binarizing text data
textcleaner(data, miss = 99, partBY = c("row", "col"), ID = NULL,
database = NULL)
A dataset of linguistic data
Value for missing data. Defaults to 99
Are participants by row or column? Set to "row" for by row. Set to "col" for by column
If subject IDs are included in the data file, then the row or column must be specified (e.g., if partBY = "row" and IDs are in the first column, then 1 should be entered)
Database for more efficient text cleaning. Defaults to NULL. Can be a vector of a corpus or any text for comparison. Currently, the only option is for "animals"
This function returns a list containing the following objects:
A matrix of responses where each row represents a participant and each column represents a unique response. A response that a participant has provided is a '1' and a response that a participant has not provided is a '0'
A response matrix that has been spell checked and de-pluralized with duplicates removed
A list containing two objects: full and unique. full contains all responses regardless of spellcheck changes and unique contains only responses that were changed during the spell check
A list containing two objects: rows and ids. rows identifies removed participants by their row location in the original data file and ids identifies removed participants by their ID
A list where each participant is an object with their responses that have been changed. Participants are identified by their ID. This can be used to replicate the cleaning process and to keep track of changes more generaly. Participants with NA did not have any changes from the original data and participants with NULL were removed due to missing data (see removed$ids)
Hornik, K., & Murdoch, D. (2010). Watch Your Spelling!. The R Journal, 3(2), 22-28.
# NOT RUN {
# }
# NOT RUN {
rmat <- semnetcleaner(trial, partBY = "col")
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab