x
containing characters
that are not in standardCharacters
.grepNonStandardCharacters(x, value=FALSE,
standardCharacters=c(letters, LETTERS, ' ','.', ',', 0:9,
'"', "'", '-', '_', '(', ')', '[', ']', ''),
... )
- x
{
character vector in which it is desired to identfy elements
containing characters not in standardCharacters
.
}
- value
{
logical: TRUE
to return the values found in x
,
FALSE
to return their indices.
}
- standardCharacters
{
Characters to overlook in x
to identify anything not in
standardCharacters
.
}
- ...
{ optional arguments for regexpr
}
1. x. <- strsplit(x, ''): convert the input character vector to a
list of vectors of character vectors with nchar(x.[i])
== 1
for i in 1:length(x). 2. sapply(x., ...) to identify all elements for which any element of
x[[i]] is not in standardCharacters
.
an integer vector identifying all elements of x
containing a
character not in standardCharacters
.
[object Object]
stringi-package
grep
,
regexpr
,
subNonStandardCharacters
,
showNonASCII
Names <- c('Raul', 'Ra`l', 'Torres,Raul', 'Torres, Raul')
# confusion in character sets can create
# names like Names[2]chk <- grepNonStandardCharacters(Names)
stopifnot(
all.equal(chk, 2)
)
chkv <- grepNonStandardCharacters(Names, TRUE)
stopifnot(
all.equal(chkv, 'Ra`l')
)
manip