textcleaner: Text Cleaner

Description

An automated cleaning function for spell-checking, de-pluralizing, removing duplicates, and binarizing text data

Usage

textcleaner(data, miss = 99, partBY = c("row", "col"), ID = NULL,
  database = NULL)

Arguments

data

A dataset of linguistic data

miss

Value for missing data. Defaults to 99

partBY

Are participants by row or column? Set to "row" for by row. Set to "col" for by column

If subject IDs are included in the data file, then the row or column must be specified (e.g., if partBY = "row" and IDs are in the first column, then 1 should be entered)

database

Database for more efficient text cleaning. Defaults to NULL. Can be a vector of a corpus or any text for comparison. Currently, the only option is for "animals"

Value

This function returns a list containing the following objects:

binary

A matrix of responses where each row represents a participant and each column represents a unique response. A response that a participant has provided is a '1' and a response that a participant has not provided is a '0'

resposnes

A response matrix that has been spell checked and de-pluralized with duplicates removed

spellcheck

A list containing two objects: full and unique. full contains all responses regardless of spellcheck changes and unique contains only responses that were changed during the spell check

removed

A list containing two objects: rows and ids. rows identifies removed participants by their row location in the original data file and ids identifies removed participants by their ID

partChanges

A list where each participant is an object with their responses that have been changed. Participants are identified by their ID. This can be used to replicate the cleaning process and to keep track of changes more generaly. Participants with NA did not have any changes from the original data and participants with NULL were removed due to missing data (see removed$ids)

References

Hornik, K., & Murdoch, D. (2010). Watch Your Spelling!. The R Journal, 3(2), 22-28.

Examples

Run this code

# NOT RUN {
# }
# NOT RUN {
rmat <- semnetcleaner(trial, partBY = "col")
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab