Learn R Programming

stringdist (version 0.8.2)

printable_ascii: Detect the presence of non-printable or non-ascii characters

Description

Detect the presence of non-printable or non-ascii characters

Usage

printable_ascii(x)

Arguments

x
a character vector

Value

  • A logical indicating which elements consist solely of printable ASCII characters.

Details

Printable ASCII characters consist of space, A-Z, a-z, 0-9 and the characters

! "" # $ \% & ' ( ) * + , . / : ; < = > ? @ [ ] \\ ^ _ ` { | } ~ -}

Note that this excludes tab (as it is a control character).

Some tips on character encoding and transliteration
{

Some algorithms (like soundex) are defined only on the printable ASCII character set. This excludes any character with accents for example. Translating accented characters to the non-accented ones is a form of transliteration. On many systems running R (but not all!) you can achieve this with

iconv(x,to="ASCII//TRANSLIT"),

where x is your character vector. See the documentation of iconv for details.

The stringi package (Gagolewski and Tartanus) should work on any system. The command stringi::stri_trans_general(x,"Latin-ASCII") transliterates character vector x to ASCII. } # define o-umlaut ouml <- intToUtf8("0x00F6") x <- c("Motorhead", paste0("Mot",ouml,"rhead")) # second element contains a non-ascii character printable_ascii(x)

# Control characters (like carriage return) are also excluded printable_ascii("abc\r")