sfsmisc (version 1.1-4)

AsciiToInt: Character to and from Integer Codes Conversion

Description

AsciiToInt returns integer codes in 0:255 for each (one byte) character in strings. ichar is an alias for it, for old S compatibility.

strcodes implements in R the basic engine for translating characters to corresponding integer codes.

chars8bit() is the inverse function of AsciiToint, producing “one byte” characters from integer codes. Note that it (and hence strcodes() depends on the locale, see Sys.getlocale().

Usage

AsciiToInt(strings)
     ichar(strings)
chars8bit(i = 1:255)
strcodes(x, table = chars8bit(1:255))

Arguments

strings, x

character vector.

i

numeric (integer) vector of values in 1:255.

table

a vector of (unique) character strings, typically of one character each.

Value

AsciiToInt (and hence ichar) and chars8bit return a vector of the same length as their argument.

strcodes(x, tab) returns a list of the same length and names as x with list components of integer vectors with codes in 1:255.

Details

Only codes in 1:127 make up the ASCII encoding which should be identical for all R versions, whereas the ‘upper’ half is often determined from the ISO-8859-1 (aka “ISO-Latin 1)” encoding, but may well differ, depending on the locale setting, see also Sys.setlocale.

Note that 0 is no longer allowed since, R does not allow \0 aka nul characters in a string anymore.

Examples

Run this code
# NOT RUN {
chars8bit(65:70)#-> "A" "B" .. "F"
stopifnot(identical(LETTERS,   chars8bit(65:90)),
          identical(AsciiToInt(LETTERS), 65:90))

# }
# NOT RUN {
<!-- % In R 2.1.0, the "<U+22823BF5>ld not even be parsed in UTF-8; now gives NA -->
# }
# NOT RUN {
## may only work in ISO-latin1 locale (not in UTF-8):
try( strcodes(c(a= "ABC", ch="1234", place = "Z<U+32A63A22>)) )
## in "latin-1" gives  {otherwise should give NA instead of 252}:
\dontrun{
$a
[1] 65 66 67

$ch
[1] 49 50 51 52

$place
[1]  90 252 114 105  99 104
}
 myloc <- Sys.getlocale()

if(.Platform $ OS.type == "unix") { # ''should work'' here
  try( Sys.setlocale(locale = "de_CH") )# "try": just in case
  print(strcodes(c(a= "ABC", ch="1234", place = "Z<U+32A63A22>))) # no NA hopefully
  print(AsciiToInt(chars8bit()))# -> 1:255  {if setting latin1 succeeded above}

  print(chars8bit(97:140))
  try( Sys.setlocale(locale = "de_CH.utf-8") )# "try": just in case
  print(chars8bit(97:140)) ## typically looks different than above
# }

Run the code above in your browser using DataCamp Workspace