sjmisc (version 2.7.6)

str_pos: Find partial matching and close distance elements in strings

Description

This function finds the element indices of partial matching or similar strings in a character vector. Can be used to find exact or slightly mistyped elements in a string vector.

Usage

str_pos(search.string, find.term, maxdist = 2, part.dist.match = 0,
  show.pbar = FALSE)

Arguments

search.string

Character vector with string elements.

find.term

String that should be matched against the elements of search.string.

maxdist

Maximum distance between two string elements, which is allowed to treat them as similar or equal. Smaller values mean less tolerance in matching.

part.dist.match

Activates similar matching (close distance strings) for parts (substrings) of the search.string. Following values are accepted:

  • 0 for no partial distance matching

  • 1 for one-step matching, which means, only substrings of same length as find.term are extracted from search.string matching

  • 2 for two-step matching, which means, substrings of same length as find.term as well as strings with a slightly wider range are extracted from search.string matching

Default value is 0. See 'Details' for more information.

show.pbar

Logical; f TRUE, the progress bar is displayed when computing the distance matrix. Default in FALSE, hence the bar is hidden.

Value

A numeric vector with index position of elements in search.string that partially match or are similar to find.term. Returns -1 if no match was found.

Details

For part.dist.match = 1, a substring of length(find.term) is extracted from search.string, starting at position 0 in search.string until the end of search.string is reached. Each substring is matched against find.term, and results with a maximum distance of maxdist are considered as "matching". If part.dist.match = 2, the range of the extracted substring is increased by 2, i.e. the extracted substring is two chars longer and so on.

See Also

group_str

Examples

Run this code
# NOT RUN {
string <- c("Hello", "Helo", "Hole", "Apple", "Ape", "New", "Old", "System", "Systemic")
str_pos(string, "hel")   # partial match
str_pos(string, "stem")  # partial match
str_pos(string, "R")     # no match
str_pos(string, "saste") # similarity to "System"

# finds two indices, because partial matching now
# also applies to "Systemic"
str_pos(string,
        "sytsme",
        part.dist.match = 1)

# finds nothing
str_pos("We are Sex Pistols!", "postils")
# finds partial matching of similarity
str_pos("We are Sex Pistols!", "postils", part.dist.match = 1)
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab