Learn R Programming

textclean (version 0.3.0)

check_text: Check Text For Potential Problems

Description

Uncleaned text may result in errors, warnings, and incorrect results in subsequent analysis. check_text checks text for potential problems and suggests possible fixes. Potential text anomalies that are detected include: factors, missing ending punctuation, empty cells, double punctuation, non-space after comma, no alphabetic characters, non-ascii, missing value, and potentially misspelled words.

Usage

check_text(x, file = NULL)

Arguments

x
The text variable.
file
A connection, or a character string naming the file to print to. If NULL prints to the console. Note that this is assigned as an attribute and passed to print.

Value

Returns a list with the following potential text faults reports:
  • non_character- Text that is factor.
  • missing_ending_punctuation- Text with no endmark at the end of the string.
  • empty- Text that contains an empty element (i.e., "").
  • double_punctuation- Text that contains two punctuation marks in the same string.
  • non_space_after_comma- Text that contains commas with no space after them.
  • no_alpha- Text that contains string elements with no alphabetic characters.
  • non_ascii- Text that contains non-ASCII characters.
  • missing_value- Text that contains missing values (i.e., NA).
  • containing_escaped- Text that contains escaped (see ?Quotes).
  • containing_digits- Text that contains digits.
  • indicating_incomplete- Text that contains endmarks that are indicative of incomplete/trailing sentences (e.g., ...).
  • potentially_misspelled- Text that contains potentially misspelled words.

Examples

Run this code
## Not run: 
# x <- c("i like", "i want. thet them ther .", "I am ! that|", "", NA, 
#     "they,were there", ".", "   ", "?", "3;", "I like goud eggs!", 
#     "i 4like...", "\\tgreat",  "She said \"yes\"")
# check_text(x)
# print(check_text(x), include.text=FALSE)
# 
# y <- c("A valid sentence.", "yet another!")
# check_text(y)
# ## End(Not run)

Run the code above in your browser using DataLab