subNonStandardCharacters: sub nonstandard characters with replacement

Description

Find the first and last character not in standardCharacters and replace all between them with replacement. For example, a string like "Ruben" where "e" carries and accent and is mangled by some software would become something like "Rub_n" using the default values for standardCharacters and replacement.

Usage

subNonStandardCharacters(x,
   standardCharacters=c(letters, LETTERS, ' ','.', ',', 0:9,
      '"', "'", '-', '_', '(', ')', '[', ']', ''),
   replacement='_',
   gsubList=list(list(pattern='\\\\\\\\|\\\\',
      replacement='"')),
   ... )
x{
    character vector in which it is desired to find the first and last
    character not in standardCharacters and replace that
    substring by replacement.
  }
  standardCharacters{
    a character vector of acceptable characters to keep.
  }
  replacement{
    a character to replace the subtring starting and ending with
    characters not in standardCharacters.
  }
  gsubList{
    list of lists of pattern and replacement arguments
    to be called in succession before looking for nonStandardCharacters
  }
  ...{
    optional arguments passed to strsplit
  }
1.  for(il in 1:length(gsubList))x <- gsub(
  gsubList[[il]][["pattern"]], gsublist[[il]][['replacement']], x)

  2.  nx <- length(x)

  3.  x. <- strsplit(x, "", ...)

  4.  for(ix in 1:nx) find the first and last standardCharacters
  in x.[ix] and substitute replacement for everything in between.
a character vector with everthing between the first and last character
  not in standardCharacters replaced by replacement.
[object Object]
sub, strsplit,
  grepNonStandardCharacters,
  subNonStandardNames
  encoded_text_to_latex
  subNonStandardNames# Consider Names = Ruben, Avila and Jose, where "e" and "A" in
#    these examples carry an accent.  With the default values
#    for standardCharacters and replacement, these would become
#    Rub_en, _vila, and Jos_.
#    (The standard checks for R packages complains about
#    non-standard characters, so none are included here.)
#
Names <- c('Ra`l', 'Ra`', '`l', 'Torres, Raul',
           "Robert C. \\Bobby\\\\")
#  confusion in character sets can create
#  names like Names[2]
Name2 <- subNonStandardCharacters(Names)

Name2. <- c('Ra_l', 'Ra_', '_l', Names[4],
            'Robert C. "Bobby"')

stopifnot(
all.equal(Name2, Name2.)
)
manip

Description

Usage

Arguments