Last chance! 50% off unlimited learning
Sale ends in
This function groups elements of a string vector (character or string variable) according to the element's distance ('similatiry'). The more similar two string elements are, the higher is the chance to be combined into a group.
group_str(strings, maxdist = 2, method = "lv", strict = FALSE,
trim.whitespace = TRUE, remove.empty = TRUE, showProgressBar = FALSE)
Character vector with string elements.
Maximum distance between two string elements, which is allowed to treat two elements as similar or equal.
Method for distance calculation. The default is "lv"
. See
stringdist
for details.
Logical; if TRUE
, value matching is more strictly. See 'Examples'.
Logical; if TRUE
(default), leading and trailing white spaces will
be removed from string values.
Logical; if TRUE
(default), empty string values will be removed from the
character vector strings
.
Logical; if TRUE
, the progress bar is displayed when computing the distance matrix.
Default in FALSE
, hence the bar is hidden.
A character vector where similar string elements (values) are recoded
into a new, single value. The return value is of same length as
strings
, i.e. grouped elements appear multiple times, so
the count for each grouped string is still avaiable (see 'Examples').
# NOT RUN {
oldstring <- c("Hello", "Helo", "Hole", "Apple",
"Ape", "New", "Old", "System", "Systemic")
newstring <- group_str(oldstring)
# see result
newstring
# count for each groups
table(newstring)
# print table to compare original and grouped string
frq(oldstring)
frq(newstring)
# larger groups
newstring <- group_str(oldstring, maxdist = 3)
frq(oldstring)
frq(newstring)
# be more strict with matching pairs
newstring <- group_str(oldstring, maxdist = 3, strict = TRUE)
frq(oldstring)
frq(newstring)
# }
Run the code above in your browser using DataLab