This function allows removing shared words, ie triming to non-redundant words.
rmSharedWords(
x,
sep = c("_", " ", "."),
anySep = TRUE,
newSep = NULL,
minLe = 2,
na.omit = FALSE,
fixed = TRUE,
silent = FALSE,
debug = FALSE,
callFrom = NULL
)This function returns character vector of same length (unless na.omit=TRUE), simply with modified text-content
(character) main input for making non-redundant
(character) separator(s) to be used
(logical) if TRUE, will consider all separators at one time (), thus combinations with different separators won't be distinguished
(character) new (uniform) separator between words, if NULL the first value/separator of if sep will be used
(integer) minimum length for allowing being recognised as 'word'
(logical) if TRUE NAs will be removed from output
(logical) will be transmitted to argument fixed of strsplit(); if TRUE regular expressions are allowed/used
(logical) suppress messages
(logical) additional messages for debugging
(character) allows easier tracking of messages produced
Heading separators will be removed in any case (even if not followed by a 'word').
Special characters will be automatically protected. When looking for repeated words, the order of such words does NOT matter, multiple repeats will be removed, too.
#'
trimRedundText
x1 <- c("aa_A1 yy_zz.txt", NA, "B2 yy_aa_aa_zz.txt")
rmSharedWords(x1)
Run the code above in your browser using DataLab