Learn R Programming

Ecfun (version 0.1-2)

subNonStandardCharacters: sub nonstandard characters with replacement

Description

Find the first and last character not in standardCharacters and replace all between them with replacement. For example, a string like "Ruben" where "e" carries and accent and is mangled by some software would become something like "Rub_n" using the default values for standardCharacters and replacement.

Usage

subNonStandardCharacters(x,
   standardCharacters=c(letters, LETTERS, ' ','.', ',', 0:9,
      '"', "'", '-', '_', '(', ')', '[', ']', ''),
   replacement='_',
   gsubList=list(list(pattern='\\\\\\\\|\\\\',
      replacement='"')),
   ... )
x{
    character vector in which it is desired to find the first and last
    character not in standardCharacters and replace that
    substring by replacement.
  }
  standardCharacters{
    a character vector of acceptable characters to keep.
  }
  replacement{
    a character to replace the subtring starting and ending with
    characters not in standardCharacters.
  }
  gsubList{
    list of lists of pattern and replacement arguments
    to be called in succession before looking for nonStandardCharacters
  }
  ...{
    optional arguments passed to strsplit
  }
1. for(il in 1:length(gsubList))x <- gsub( gsubList[[il]][["pattern"]], gsublist[[il]][['replacement']], x) 2. nx <- length(x) 3. x. <- strsplit(x, "", ...) 4. for(ix in 1:nx) find the first and last standardCharacters in x.[ix] and substitute replacement for everything in between.
a character vector with everthing between the first and last character not in standardCharacters replaced by replacement. [object Object] sub, strsplit, grepNonStandardCharacters, subNonStandardNames encoded_text_to_latex subNonStandardNames# Consider Names = Ruben, Avila and Jose, where "e" and "A" in # these examples carry an accent. With the default values # for standardCharacters and replacement, these would become # Rub_en, _vila, and Jos_. # (The standard checks for R packages complains about # non-standard characters, so none are included here.) # Names <- c('Ra`l', 'Ra`', '`l', 'Torres, Raul', "Robert C. \\Bobby\\\\") # confusion in character sets can create # names like Names[2] Name2 <- subNonStandardCharacters(Names) Name2. <- c('Ra_l', 'Ra_', '_l', Names[4], 'Robert C. "Bobby"') stopifnot( all.equal(Name2, Name2.) ) manip

Arguments