qdap (version 2.4.1)

check_text: Check Text For Potential Problems

Description

Uncleaned text may result in errors, warnings, and incorrect results in subsequent analysis. check_text checks text for potential problems and suggests possible fixes. Potential text anomalies that are detected include: factors, missing ending punctuation, empty cells, double punctuation, non-space after comma, no alphabetic characters, non-ascii, missing value, and potentially misspelled words.

Usage

check_text(text.var, file = NULL)

Arguments

text.var

The text variable.

file

A connection, or a character string naming the file to print to. If NULL prints to the console. Note that this is assigned as an attribute and passed to print.

Value

Returns a list with the following potential text faults reports:

  • non_character- Text that is non-character.

  • missing_ending_punctuation- Text with no endmark at the end of the string.

  • empty- Text that contains an empty element (i.e., "").

  • double_punctuation- Text that contains two qdap punctuation marks in the same string.

  • non_space_after_comma- Text that contains commas with no space after them.

  • no_alpha- Text that contains string elements with no alphabetic characters.

  • non_ascii- Text that contains non-ASCII characters.

  • missing_value- Text that contains missing values (i.e., NA).

  • containing_escaped- Text that contains escaped (see ?Quotes).

  • containing_digits- Text that contains digits.

  • indicating_incomplete- Text that contains endmarks that are indicative of incomplete/trailing sentences (e.g., ...).

  • potentially_misspelled- Text that contains potentially misspelled words.

See Also

check_spelling_interactive

Examples

Run this code
# NOT RUN {
x <- c("i like", "i want. thet them .", "I am ! that|", "", NA, 
    "they,were there", ".", "   ", "?", "3;", "I like goud eggs!", 
    "i 4like...", "\\tgreat",  "She said \"yes\"")
check_text(x)
print(check_text(x), include.text=FALSE)

y <- c("A valid sentence.", "yet another!")
check_text(y)
# }

Run the code above in your browser using DataCamp Workspace