rmEnumeratorName: Remove or rename enumerator tag/name (or remove entire enumerator) from tailing enumerators

Description

This function allows indentifying, removing or renaming enumerator tag/name (or remove entire enumerator) from tailing enumerators (eg 'abc_No1' to 'abc_1'). A panel of potential candidates as combination of separator-symbols and separtor text/words will be tested to find if one matches all data. In case the main input is a matrix, all columns will be tested independently to find the first column where one specific combination of separator-symbols and separtor text/words is found. Several options exist for the output, the combination of separator-symbols and separtor text/words may be included, too.

Usage

rmEnumeratorName(
  dat,
  nameEnum = c("Number", "No", "#", "Replicate", "Sample"),
  sepEnum = c(" ", "-", "_", "/"),
  newSep = "",
  incl = c("anyCase", "trim2"),
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)

Value

This function returns a corrected vector (or matrix), or a list if incl="rmEnumL" containing $dat (corrected data), $pattern (the combination of separator-symbols and separtor text/words found), and if input is matrix $column (which column of the input was identified and treated)

Arguments

dat: (character vecor or matrix) main input
nameEnum: (character) potential enumerator-names
sepEnum: (character) potential separators for enumerator-names
newSep: (character) potential enumerator-names
incl: (character) options to include further variants of the enumerator-names, use "rmEnum" for completely removing enumerator tag/name and digits for different options of trimming names/tags from nameEnum; or one may use anyCase, trim3 (trimming down to max 3 letters), trim2 (trimming to max 2 letters) or trim1 (trimming down to single letter); trim0 works like trim1 but also includes ' ', ie no enumerator tag/name in front of the digit(s)
silent: (logical) suppress messages
debug: (logical) display additional messages for debugging
callFrom: (character) allow easier tracking of messages produced

Details

In case only digit-enumerators are present (ie, without repetitive text), one has to use incl="rmEnum" to remove terminal enumerators. This will work, only when all items do contain terminal digits.

Please note, that checking a variety of different separator text-word and separator-symbols may give an important number of combinations to check. In particular, when automatic trimming of separator text-words is added (eg incl="trim2"), the complexity of associated searches increases quickly. Thus, with large data-sets restricting the content of the arguments nameEnum, sepEnum and (in particular) newSep to the most probable terms/options is suggested to help reducing demands on memory and CPU.

In case the input dat is a matrix and multiple different numerator-types are found, only the first colum (from the left) will be treated. If you which to remove/subsitute mutiple types of enumerators the function rmEnumeratorName must be run independently, see last example below.

Examples

Run this code

xv <- c("hg_1","hjRe2_2","hk-33")
rmEnumeratorName(xv)
rmEnumeratorName(xv, incl="rmEnum")

xx <- c("hg_Re1","hjRe2_Re2","hk-Re3_Re33")
rmEnumeratorName(xx)
rmEnumeratorName(xx, newSep="--")
rmEnumeratorName(xx, incl="anyCase")

xy <- cbind(a=11:13, b=c("11#11","2_No2","333_samp333"), c=xx)
rmEnumeratorName(xy)
rmEnumeratorName(xy,incl=c("anyCase","trim2","rmEnumL"))

xz <- cbind(a=11:13, b=c("23#11","4#2","567#333"), c=xx)
apply(xz, 2, rmEnumeratorName, sepEnum=c("","_"), newSep="_", silent=TRUE)

Run the code above in your browser using DataLab

State of Data and AI Literacy Report 2025