
This function allows indentifying, removing or renaming enumerator tag/name (or remove entire enumerator) from tailing enumerators (eg 'abc_No1' to 'abc_1'). A panel of potential candidates as combination of separator-symbols and separtor text/words will be tested to find if one matches all data. In case the main input is a matrix, all columns will be tested independently to find the first column where one specific combination of separator-symbols and separtor text/words is found. Several options exist for the output, the combination of separator-symbols and separtor text/words may be included, too.
rmEnumeratorName(
dat,
nameEnum = c("Number", "No", "#", "Replicate", "Sample"),
sepEnum = c(" ", "-", "_", "/"),
newSep = "",
incl = c("anyCase", "trim2"),
silent = FALSE,
debug = FALSE,
callFrom = NULL
)
This function returns a corrected vector (or matrix), or a list if incl="rmEnumL"
containing $dat (corrected data),
$pattern (the combination of separator-symbols and separtor text/words found), and if input is matrix $column (which column of the input was identified and treated)
(character vecor or matrix) main input
(character) potential enumerator-names
(character) potential separators for enumerator-names
(character) potential enumerator-names
(character) options to include further variants of the enumerator-names,
use "rmEnum"
for completely removing enumerator tag/name and digits for different options of trimming names/tags from nameEnum
;
or one may use anyCase
,
trim3
(trimming down to max 3 letters),
trim2
(trimming to max 2 letters) or trim1
(trimming down to single letter);
trim0
works like trim1
but also includes ' ', ie no enumerator tag/name in front of the digit(s)
(logical) suppress messages
(logical) display additional messages for debugging
(character) allow easier tracking of messages produced
In case only digit-enumerators are present (ie, without repetitive text), one has to use incl="rmEnum"
to remove terminal enumerators.
This will work, only when all items do contain terminal digits.
Please note, that checking a variety of different separator text-word and separator-symbols may give an important number of combinations to check.
In particular, when automatic trimming of separator text-words is added (eg incl="trim2"
), the complexity of associated searches increases quickly.
Thus, with large data-sets restricting the content of the arguments nameEnum
, sepEnum
and (in particular) newSep
to the most probable terms/options
is suggested to help reducing demands on memory and CPU.
In case the input dat
is a matrix and multiple different numerator-types are found, only the first colum (from the left) will be treated.
If you which to remove/subsitute mutiple types of enumerators the function rmEnumeratorName
must be run independently, see last example below.
when the exact pattern is known grep
and sub
may allow direct manipulations much faster
xv <- c("hg_1","hjRe2_2","hk-33")
rmEnumeratorName(xv)
rmEnumeratorName(xv, incl="rmEnum")
xx <- c("hg_Re1","hjRe2_Re2","hk-Re3_Re33")
rmEnumeratorName(xx)
rmEnumeratorName(xx, newSep="--")
rmEnumeratorName(xx, incl="anyCase")
xy <- cbind(a=11:13, b=c("11#11","2_No2","333_samp333"), c=xx)
rmEnumeratorName(xy)
rmEnumeratorName(xy,incl=c("anyCase","trim2","rmEnumL"))
xz <- cbind(a=11:13, b=c("23#11","4#2","567#333"), c=xx)
apply(xz, 2, rmEnumeratorName, sepEnum=c("","_"), newSep="_", silent=TRUE)
Run the code above in your browser using DataLab